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I'm writing a book. 

I've got the page numbers done, 

so now I just have to fill in the rest. 

— Stephen Wright 



About These Notes 

These are lecture notes that I wrote for the course "Algorithms and Models of Computation" 
at the University of Illinois, Urbana-Champaign for the first time in Fall 2014. This course is a 
broad introduction to theoretical computer science, aimed at third-year computer science and 
computer engineering majors, that covers both fundamental topics in algorithms, for which I 
already have copious notes, and fundamental topics on formal languages and automata, for 
which I wrote the notes you are reading now. 

The most recent revision of these notes (or nearly so) is available online at http://www.es. 
illinois.edu/~jeffe/teaching/algorithms/, along with my algorithms notes and a near-complete 
archive of past homeworks and exams from all my theoretical computer science classes. I plan to 
revise and reorganize these whenever I teach this material, so you may find more recent versions 
on the web page of whatever course I am currently teaching. 

About the Exercises 

Each note ends with several exercises, many of which I used in homeworks, discussion sections, 
or exams. *Stars indicate more challenging problems (which I have not used in homeworks, 
discussion sections, or exams). Many of these exercises were contributed by my amazing teaching 
assistants: 

Alex Steiger, Chao Xu, Connor Clark, Gail Steitz, Grant Czajkowski, Hsien-Chih 
Chang, Junqing Deng, Nick Bachmair, and Tana Wattanawaroon 

Please do not ask me for solutions to the exercises. If you are a student, seeing the solution 
will rob you of the experience of solving the problem yourself, which is the only way to learn the 
material. If you are an instructor, you shouldn't ask your students to solve problems that you 
can't solve yourself. (I don't always follow my own advice, so I'm sure some of the problems are 
buggy.) 

Caveat Lector! 

These notes are best viewed as an unfinished first draft. You should assume the notes 
contain several major errors, in addition to the usual unending supply of typos, fencepost errors, 
off-by-one errors, and brain farts. Before Fall 2014, 1 had not taught this material in more than 
two decades. Moreover, the course itself is still very new — Lenny Pitt and I developed the 
course and offered the first pilot in Spring 2014 (with Lenny presenting the formal language 
material) — so even the choice of which material to emphasize, sketch, or exclude is still very 
much in flux. 

I would sincerely appreciate feedback of any kind, especially bug reports. 
Thanks, and enjoy! 



-Jeff 
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THOMAS GODFREY, a self-taught mathematician, great in his way, and afterward inventor of what is 
now called Hadley's Quadrant. But he knew little out of his way, and was not a pleasing companion; 
as, like most great mathematicians I have met with, he expected universal precision in everything 
said, or was forever denying or distinguishing upon trifles, to the disturbance of all conversation. 
He soon left us. 

— Benjamin Franklin, Memoirs, Part 1 (1771) 
describing one of the founding members of the Junto 

/ hope the reader sees that the alphabet can be understood by any intelligent being who has any 
one of the five senses left him, — by all rational men, that is, excepting the few eyeless deaf persons 

who have lost both taste and smell in some complete paralysis Whales in the sea can telegraph 

as well as senators on land, if they will only note the difference between long spoutings and short 

ones A tired listener at church, by properly varying his long yawns and his short ones, may 

express his opinion of the sermon to the opposite gallery before the sermon is done. 

— Edward Everett Hale, "The Dot and Line Alphabet", Altlantic Monthy (October 1858) 

If indeed, as Hilbert asserted, mathematics is a meaningless game played with meaningless marks 
on paper, the only mathematical experience to which we can refer is the making of marks on paper. 

— Eric Temple Bell, The Queen of the Sciences (1931) 



1 Strings 

Throughout this course, we will discuss dozens of algorithms and computational models that 
manipulate sequences: one-dimensional arrays, linked lists, blocks of text, walks in graphs, 
sequences of executed instructions, and so on. Ultimately the input and output of any algorithm 
must be representable as a finite string of symbols — the raw contents of some contiguous portion 
of the computer's memory. Reasoning about computation requires reasoning about strings. 

This note lists several formal definitions and formal induction proofs related to strings. These 
definitions and proofs are intentionally much more detailed than normally used in practice — most 
people's intuition about strings is fairly accurate — but the extra precision is necessary for any 
sort of formal proof. It may be helpful to think of this material as part of the "assembly language" 
of theoretical computer science. We normally think about computation at a much higher level 
of abstraction, but ultimately every argument must "compile" down to these (and similar) 
definitions. 

1.1 Definitions 

Fix an arbitrary finite set £ called the alphabet; the elements of S are called symbols or 
characters. As a notational convention, I will always use lower-case letters near the start of 
the English alphabet (a, b,c,.. .) as symbol variables, and never as explicit symbols. For explicit 
symbols, I will always use fixed-width upper-case letters (A, B, C, . . . ), digits (0, 1, 2, ... ), or 
other symbols (o, $, #, •,...) that are clearly distinguishable from variables. 

A string (or word) over S is a finite sequence of zero or more symbols from S. Formally, a 
string w over £ is defined recursively as either 

• the empty string, denoted by the Greek letter e (epsilon),or 

• an ordered pair (a,x), where a is a symbol in £ and x is a string over S. 

© Copyright 2014 Jeff Erickson. 
This work is licensed under a Creative Commons License (http://creativecommons.0rg/licenses/by-nc-sa/4.O/). 
Free distribution is strongly encouraged; commercial distribution is expressly forbidden. 
See http://www.cs.uiuc.edu/-jeffe/teaching/algorithms/ for the most recent revision. 
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We normally write either a • x or simply ax to denote the ordered pair (a,x). Similarly, we 
normally write explicit strings as sequences of symbols instead of nested ordered pairs; for 
example, STRING is convenient shorthand for the formal expression (S, (T, (R, (I, (N, (G, £)))))). 
As a notational convention, I will always use lower-case letters near the end of the alphabet 
(_..., w,x, y, z) to represent unknown strings, and SHOUTY0MONOSPACED0TEXT to represent explicit 
symbols and (non-empty) strings. 

The set of all strings over S is denoted E* (pronounced "sigma star"). It is very important to 
remember that every element of S* is a finite string, although S* itself is an infinite set containing 
strings of every possible finite length. 

The length \w | of a string w is the number of symbols in w, defined formally as follows: 



For example, the string SEVEN has length 5. Although they are formally different objects, we do 
not normally distinguish between symbols and strings of length 1. 

The concatenation of two strings x and y, denoted either x • y or simply xy, is the 
unique string containing the characters of x in order, followed by the characters in y in order. 
For example, the string NOWHERE is the concatenation of the strings NOW and HERE; that is, 
NOW • HERE = NOWHERE. (On the other hand, HERE • NOW = HERENOW.) Formally, concatenation is 
defined recusively as follows: 



(Here I'm using a larger dot • to formally distinguish the operator that concatenates two arbitrary 
strings from from the operator • that builds a string from a single character and a string.) 

When we describe the concatenation of more than two strings, we normally omit all dots 
and parentheses, writing wxyz instead of (w • (x • y)) • z, for example. This simplification is 
justified by the fact (which we will prove shortly) that • is associative. 

1.2 Induction on Strings 

Induction is the standard technique for proving statements about recursively defined objects. 
Hopefully you are already comfortable proving statements about natural numbers via induction, 
but induction actually a far more general technique. Several different variants of induction 
can be used to prove statements about more general structures; here I describe the variant 
that I recommend (and actually use in practice). This variant follows two primary design 
considerations: 

• The case structure of the proof should mirror the case structure of the recursive defi- 
nition. For example, if you are proving something about all strings, your proof should have 
two cases: Either w = e, or w = ax for some symbol a and string x. 

• The inductive hypothesis should be as strong as possible. The (strong) inductive hypoth- 
esis for statements about natural numbers is always "Assume there is no counterexample k 
such that k < n." I recommend adopting a similar inductive hypothesis for strings: "Assume 
there is no counterexample x such that |x| < |w|." Then for the case w = ax, we have 
|x| = |w| — 1 < \w\ by definition of \w\, so the inductive hypothesis applies to x. 




0 if w = e, 

1 + |x| if w = ax. 



w • z := 
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Thus, string-induction proofs have the following boilerplate structure. Suppose we want to prove 
that every string is perfectly cromulent, whatever that means. The white boxes hide additional 
proof details that, among other things, depend on the precise definition of "perfectly cromulent". 



Proof: Let w be an arbitrary string. 

Assume, for every string x such that |x| < \w\, that x is perfectly cromulent. 
There are two cases to consider. 

• Suppose w = e. 



Therefore, w is perfectly cromulent. 

• Suppose w = ax for some symbol a and string x. 
The induction hypothesis implies that x is perfectly cromulent. 



Therefore, w is perfectly cromulent. 
In both cases, we conclude that w is perfectly cromulent. 



□ 



Here are three canonical examples of this proof structure. When developing proofs in this 
style, I strongly recommend first mindlessly writing the green text (the boilerplate) with lots of 
space for each case, then filling in the red text (the actual theorem and the induction hypothesis), 
and only then starting to actually think. 

Lemma 1.1. For every string w, we have w • e = w. 

Proof: Let w be an arbitrary string. Assume that x • e = x for every string x such that |x| < |w|. 
There are two cases to consider: 



Suppose w = e. 



w • s — e • e 

= s 
= w 



because w = s, 
by definition of concatenation, 
because w = s. 



• Suppose w = ax for some symbol a and string x. 



w • e — (a • x) • s 
= a • (x • e) 
= a- x 
= w 



because w = ax, 
by definition of concatenation, 
by the inductive hypothesis, 
because w = ax. 



In both cases, we conclude that w • s — w. 

Lemma 1.2. Concatenation adds length: \w • x \ = |w| + \x\for all strings w and x. 



□ 



Proof: Let w and x be arbitrary strings. Assume that |y • x| = |y| + |x| for every string y such 
that \y I < |w|. (Notice that we are using induction only on w, not on x.) There are two cases to 
consider: 
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• Suppose w = e. 



\w • x\ = \e • x| 

= \x\ 

= \e\ + \x\ 

= |w| + \x\ 



because w = e 
by definition of | j 
I e I = 0 by definition of | j 
because w = e 



• Suppose w = ay for some symbol a and string y. 



\w • x = 



ay • x\ 
= \a<y ' x)\ 
= l + \yx\ 
= l + |y| + |x| 

= \ay\ + \x\ 
= |w| + |x| 



because w = ay 
by definition of • 
by definition of | | 
by the inductive hypothesis 
by definition of | j 

because w = ay 



In both cases, we conclude that \w • x\ — \w\ + \x\. □ 

Lemma 1.3. Concatenation is associative: (w • x) • y — w • (x • y)for all strings w, x, and y. 

Proof: Let w, x, and y be arbitrary strings. Assume that (z • x) • y = w • (x • y) for every string 
z such that |z| < |w|. (Again, we are using induction only on w.) There are two cases to consider. 



• Suppose w = e. 

(w x) • j = 0 • x) • j 
= x • y 
= s • (x • j) 
= w • (x • y) 

• Suppose w = az for some symbol a and some string z. 



because w = e 
by definition of • 
by definition of • 

because w = e 



[w • x) • y = (az • x) • y 

= (a • (z • x)) • y 
= a-((*»x)»y) 
= a • (z • (x • j)) 
= az • (x • y) 
= w (x • y) 



because w = az 
by definition of • 
by definition of • 
by the inductive hypothesis 
by definition of • 

because w = az 



In both cases, we conclude that (w • x) • y — w • (x • y). 



□ 



This is not the only boilerplate that one can use for induction proofs on strings. For example, 
we can modify the inductive case analysis using the following observation: A non-empty string w 
is either a single symbol or the concatenation of two non-empty strings, which (by Lemma 1.2) 
must be shorter than w. Here is a proof of Lemma 1.3 that uses this alternative recursive structure: 



4 



Models of Computation 



Lecture 1: Strings [Fa'14] 



Proof: Let w, x, and y be arbitrary strings. Assume that (z • x') • y' = z • [x 1 • y') for all strings 
x 1 , y' , and z such that |z| < |w|. (We need a stronger induction hypothesis here than in the 
previous proofs!) There are three cases to consider. 



• Suppose w = e. 



(w x) • y - 0 • x) • y 
= x ' y 
= s • (x • y) 
= w • (x • y) 



because w = £ 
by definition of • 
by definition of • 

because w = e 



Suppose w is equal to some symbol a. 

(w • x) • y = (a • x) • y 
= (a-x)»y 
= a-(x «y) 
= a • (x • y) 
= w (x • y) 



because w = a 
because a • z = a • z by definition of • 
by definition of • 
because a • z = a ■ z by definition of • 

because w = a 



Suppose w = uv for some nonempty strings u and v. 



(w • x) • y = ((u • v) • x) • y 


because w 




uv 


= (u • (v • x)) • y 


by the inductive hypothesis, because |u| 


< 


\w\ 


= u • (fv • x) • y) 


by the inductive hypothesis, because |u| 


< 


M 


= u • (v (x • y)) 


by the inductive hypothesis, because v 


< 


M 


= (u • v) • (x • y) 


by the inductive hypothesis, because \u\ 


< 


M 


= w (x • y) 


because w 




uv 



In both cases, we conclude that (w • x) • y = w • (x • y). 



□ 



1.3 Indices, Substrings, and Subsequences 

For any string w and any integer 1 < i < \w\, the expression denotes the tth symbol in w, 
counting from left to right. More formally, w ; is recursively defined as follows: 




if w = ax and i = 1 
if w = ax and i > 1 



As one might reasonably expect, w t is formally undefined if i < 1 or w = e, and therefore (by 
induction) if i > \w\. The integer i is called the index of w ; . 

We sometimes write strings as a concatenation of their constituent symbols using this 
subscript notation: w = w 1 w 2 • • • Wj w j. While standard, this notation is slightly misleading, since 
it incorrectly suggests that the string w contains at least three symbols, when in fact w could be a 
single symbol or even the empty string. 

In actual code, subscripts are usually expressed using the bracket notation w[i]. Brackets 
were introduced as a typographical convention over a hundred years ago because subscripts and 
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superscripts 1 were difficult or impossible to type. 2 We sometimes write strings as explicit arrays 
w[l ..n], with the understanding that n = |w|. Again, this notation is potentially misleading; 
always remember that n might be zero; the string/array could be empty. 

A substring of a string w is another string obtained from w by deleting zero or more symbols 
from the beginning and from the end. Formally, a string y is a substring of w if and only if 
there are strings x and z such that w = xyz. Extending the array notation for strings, we write 
w[i ..j] to denote the substring of w starting at w t and ending at w,-. More formally, we define 



A proper substring of w is any substring other than w itself. For example, LAUGH is a proper 
substring of SLAUGHTER. Whenever y is a (proper) substring of w, we also call w a (proper) 
superstring of y. 

Aprefix of w[l .. n] is any substring of the form w[l .. ;]. Equivalently, a string p is a prefix 
of another string w if and only if there is a string x such that px = w. A proper prefix of w is 
any prefix except w itself. For example, DIE is a proper prefix of DIET. 

Similarly, a suffix of w[l .. n] is any substring of the form w[i .. n]. Equivalently, a string 5 is a 
suffix of a string w if and only if there is a string x such that xs = w. A proper suffix of w is any 
suffix except w itself. For example, YES is a proper suffix of EYES, and HE is both a proper prefix 
and a proper suffix of HEADACHE. 

A subsequence of a string w is a strong obtained by deleting zero or more symbols from 
anywhere in w. More formally, z is a subsequence of w if and only if 

• z = e, or 

• w = ax for some symbol a and some string x such that z is a subsequence of x. 

• w = ax and z = ay for some symbol a and some strings x and y, and y is a subsequence 



A proper subsequence of w is any subsequence of w other than w itself. Whenever z is a (proper) 
subsequence of w, we also call w a (proper) supersequence of z. 

ir The same bracket notation is also used for bibliographic references, instead of the traditional footnote/endnote 
superscripts, for exactly the same reasons. 

2 A typewriter is an obsolete mechanical device loosely resembling a computer keyboard. Pressing a key on a 
typewriter moves a lever (called a "typebar") that strikes a cloth ribbon full of ink against a piece of paper, leaving the 
image of a single character. Many historians believe that the ordering of letters on modern keyboards (QWERTYUIOP) 
evolved in the late 1800s, reaching its modern form on the 1874 Sholes & Glidden Type-Writer™, in part to separate 
many common letter pairs, to prevent typebars from jamming against each other; this is also why the keys on most 
modern keyboards are arranged in a slanted grid. (The common folk theory that the ordering was deliberately 
intended to slow down typists doesn't withstand careful scrutiny.) A more recent theory suggests that the ordering 
was influenced by telegraph 3 operators, who found older alphabetic arrangements confusing, in part because of 
ambiguities in American Morse Code. 

3 A telegraph is an obsolete electromechanical communication device consisting of an electrical circuit with a 
switch at one end and an electromagnet at the other. The sending operator would press and release a key, closing and 
opening the circuit, originally causing the electromagnet to push a stylus onto a moving paper tape, leaving marks 
that could be decoded by the receiving operator. (Operators quickly discovered that they could directly decode the 
clicking sounds made by the electromagnet, and so the paper tape became obsolete almost immediately.) The most 
common scheme within the US to encode symbols, developed by Alfred Vail and Samuel Morse in 1837, used (mostly) 
short (•) and long (— ) marks — now called "dots" and "dashes", or "dits" and "dahs" — separated by gaps of various 
lengths. American Morse code (as it became known) was ambiguous; for example, the letter Z and the string SE were 
both encoded by the sequence • ■ • • ("di-di-dit, dit") . This ambiguity has been blamed for the S key's position on the 




s if j < i, 

W; • w[t + 1 ..j] otherwise. 



of x. 
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Substrings and subsequences are not the same objects; don't confuse them! Every substring 
of w is also a subsequence of w, but not every subsequence is a substring. For example, METAL is 
a subsequence, but not a substring, of MEATBALL. To emphasize the distinction, we sometimes 
redundantly refer to substrings of w as contiguous substrings, meaning all their symbols appear 
together in w. 



Most of the following exercises ask for proofs of various claims about strings. For each claim, give 
a complete, self-contained, formal proof by inductive definition-chasing, using the boilerplate 
structure recommended in Section 1.2. You can use Lemmas 1.1, 1.2, and 1.3, but don't assume 
any other facts about strings that you have not actually proved. Do not use the words "obvious" 
or "clearly" or "just". Most of these claims are in fact obvious; the real exercise is understanding 
why they're obvious. 

1. For any symbol a and any string w, let #(a, w) denote the number of occurrences of a in 
w. For example, #(A, BANANA) = 3 and #(X, FLIBBERTIGIBBET) = 0. 

(a) Give a formal recursive definition of the function # : S x £* — > N. 

(b) Prove that #(a, xy) = #(a, x) + #(a, y) for every symbol a and all strings x and y . 
Your proof must rely on both your answer to part (a) and the formal recursive 
definition of string concatenation. 

2. Recursively define a set L of strings over the alphabet {0, 1} as follows: 

• The empty string e is in L. 

• For any two strings x and y in L, the string OxlyO is also in L. 

• These are the only strings in L. 

(a) Prove that the string 000010101010010100 is in L. 

(b) Prove by induction that every string in L has exactly twice as many 0s as Is. (You may 
assume the identity #(a, xy ) = #(a, x) + #(a, y) for any symbol a and any strings x 
and y; see Exercise 1(b).) 

(c) Give an example of a string with exactly twice as many 0s as Is that is not in L. 

3. For any string w and any non- negative integer n, let w n denote the string obtained by 
concatenating n copies of w; more formally, we define 



For example, (BLAH) 5 = BLAHBLAHBLAHBLAHBLAH and e 374 = e. 

Prove that w m • w n — w m+n for every string w and all integers non-negative integer n 
and m. 

typewriter keyboard near E and Z. 

Vail and Morse were of course not the first people to propose encoding symbols as strings of bits. That honor 
apparently falls to Francis Bacon, who devised a five-bit binary encoding of the alphabet (except for the letters J and U) 
in 1605 as the basis for a steganographic code — a method or hiding secret message in otherwise normal text. 



Exercises 




WW 



.n-1 



otherwise 
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4. Let w be an arbitrary string, and let n = \w\. Prove each of the following statements. 

(a) w has exactly n + 1 prefixes. 

(b) w has exactly n proper suffixes. 

(c) w has at most n{n + l)/2 distinct substrings. 

(d) w has at most 2" — 1 proper subsequences. 

5. The reversal w R of a string w is defined recursively as follows: 



(a) Prove that Iw^l = |w| for every string w. 

(b) Prove that (wx) fl = x R w R for all strings w and x. 

(c) Prove that (w R )" = (w") K for every string w and every integer n > 0. (See Exercise 1.) 

(d) Prove that (w K ) K = w for every string w. 

6. Let w be an arbitrary string, and let n = |w|. Prove the following statements for all indices 
1 < i < j < k < n. 

(a) \w[i..j]\=j-i + l 

(b) w[i .._/]• w[j + 1 .. k] = w[i ..k] 

(c) w R [t ..;'] = (w[i' ..j']) R where t' = |w| + 1-;' and f - \w\ + l — 

7. A palindrome is a string that is equal to its reversal. 

(a) Give a recursive definition of a palindrome over the alphabet S. 

(b) Prove that any string p meets your recursive definition of a palindrome if and only if 



8. A string w e S* is called a shuffle of two strings x, y e S* if at least one of the following 
recursive conditions is satisfied: 

• w = x = y = e. 

• w = aw' and x = ax 7 and w' is a shuffle of x' and y, for some a e S and some 



• w = aw' and y = ay' and w' is a shuffle of x and y', for some a e S and some 



For example, the string bANanANanASa is a shuffle of the strings BANANA and ANANAS. 

(a) Prove that if w is a shuffle of x and y, then |w| = |x| + |y |. 

(b) Prove that if w is a shuffle of x and y, then w R is a shuffle of x R and y R . 





8 



Models of Computation 



Lecture 1: Strings [Fa'14] 



9. Consider the following pair of mutually recursive functions on strings: 

(s ifw = s is if w = e 

odds(w) := < 
odds(x) if w = ax la- evens(x) if w = ax 



(a) Prove the following identity for all strings w and x 
evens(w • x) — 



evensfw) • evens(x) if \w\ is even, 
evens(w) • odds(x) if |w| is odd. 



(b) State and prove a similar identity for oddsfw • x). 

10. For any positive integer n, the Fibonacci string F n is defined recursively as follows: 

0 ifn = l, 

\ 1 if n-2. 

F n _ 2 • F n _! otherwise. 

For example, F 6 = 10101101 and F 7 = 0110110101101. 

(a) Prove that for every integer n > 2, the string F n can also be obtained from F n _i by 
replacing every occurrence of 0 with 1 and replacing every occurrence of 1 with 01. 
More formally, prove that F n = Finc{F n _i), where 

e if w = s 

Finc(w) = { 1- Finc(x) if w = Ox 
01 • Finc(x) if w = Lx 

[Hint: First prove that Finc(x • j) = Finc(x) • Finc(y).J 

(b) Prove that 00 and 111 are not substrings of any Fibonacci string F n . 

11. Prove that the following three properties of strings are in fact identical. 

• A string w e {0, 1}* is balanced if it satisfies one of the following conditions: 

- w — e, 

- w = 0x1 for some balanced string x, or 

- w = xy for some balanced strings x and y. 

• A string w e {0, 1}* is erasable if it satisfies one of the following conditions: 

- w = e, or 

- w = xOly for some strings x and y such that xy is erasable. (The strings x and 
y are not necessarily erasable.) 

• A string w e {0, 1}* is conservative if it satisfies both of the following conditions: 

- w has an equal number of 0s and Is, and 

- no prefix of w has more 0s than Is. 

(a) Prove that every balanced string is erasable. 
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(b) Prove that every erasable string is conservative. 

(c) Prove that every conservative string is balanced. 

[Hint: To develop intuition, it may be helpful to think of Os as left brackets and Is as right 
brackets, but don't invoke this intuition in your proofs.] 

12. A string w e {0, 1}* equitable if it has an equal number of 0s and Is. 

(a) Prove that a string w is equitable if and only if it satisfies one of the following 
conditions: 

• w = e, 

• w = 0x1 for some equitable string x, 

• w = 1x0 for some equitable string x, or 

• w — xy for some equitable strings x and y. 

(b) Prove that a string w is equitable if and only if it satisfies one of the following 
conditions: 

• w = e, 

• w = xOly for some strings x and y such that xy is equitable, or 

• w = xlOy for some strings x and y such that xy is equitable. 

In the last two cases, the individual strings x and y are not necessarily equitable. 

(c) Prove that a string w is equitable if and only if it satisfies one of the following 
conditions: 

• w = e, 

• w = xy for some balanced string x and some equitable string y, or 

• w = x R y for some for some balanced string x and some equitable string y. 

(See the previous exercise for the definition of "balanced".) 
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Caveat lector: This is the first edition of this lecture note. Please send bug reports and 
suggestions to jeffe@illinois.edu. 



But the Lord came down to see the city and the tower the people were building. The Lord 
said, "If as one people speaking the same language they have begun to do this, then nothing 
they plan to do will be impossible for them. Come, let us go down and confuse their language 
so they will not understand each other. " 

— Genesis 11:6-7 (New International Version) 

Soyez regie dans votre vie et ordinaire comme un bourgeois, 
afin d'etre violent et original dans vos oeuvres. 

[Be regular and orderly in your life like a bourgeois, 
so that you may be violent and original in your work.] 

— Gustave Flaubert, in a letter to Gertrude Tennant (December 25, 1876) 

Some people, when confronted with a problem, think "I know, I'll use regular expressions." 
Now they have two problems. 

— Jamie Zawinski, alt. religion. emacs (August 12, 1997) 

/ define UNIX as 30 definitions of regular expressions living under one roof. 

— Donald Knuth, Digital Typography (1999) 

2 Regular Languages 

2.1 Languages 

A formal language (or just a language) is a set of strings over some finite alphabet £, or 
equivalently, an arbitrary subset of S*. For example, each of the following sets is a language: 

• The empty set 0. 1 

• The set {e}. 

• The set {0, 1}*. 

• The set {THE, OXFORD, ENGLISH, DICTIONARY}. 

• The set of all subsequences of THEoOXFORD^ENGLISHoDICTIONARY. 

• The set of all words in The Oxford English Dictionary. 

• The set of all strings in {0, 1}* with an odd number of Is. 

• The set of all strings in {0, 1}* that represent a prime number in base 13. 

• The set of all sequences of turns that solve the Rubik's cube (starting in some fixed 
configuration) 

• The set of all python programs that print "Hello World!" 

As a notational convention, I will always use italic upper-case letters (usually L, but also A, B, C, 
and so on) to represent languages. 

^he empty set symbol 0 derives from the Norwegian letter 0, pronounced like a sound of disgust or a German 6, 
and not from the Greek letter cp ■ Calling the empty set "fie" or "fee" makes the baby Jesus cry. 

© Copyright 2014 Jeff Erickson. 
This work is licensed under a Creative Commons License (http://creativecommons.0rg/licenses/by-nc-sa/4.O/). 
Free distribution is strongly encouraged; commercial distribution is expressly forbidden. 
See http://www.cs.uiuc.edu/-jeffe/teaching/algorithms/ for the most recent revision. 
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Formal languages are not "languages" in the same sense that English, Klingon, and Python are 
"languages". Strings in a formal language do not necessarily carry any "meaning", nor are they 
necessarily assembled into larger units ("sentences" or "paragraphs" or "packages") according to 
some "grammar". 

It is very important to distinguish between three "empty" objects. Many beginning students 
have trouble keeping these straight. 

• 0 is the empty language, which is a set containing zero strings. 0 is not a string. 

• {e} is a language containing exactly one string, which has length zero, {e} is not empty, 
and it is not a string. 

• e is the empty string, which is a sequence of length zero, e is not a language. 
2.2 Building Languages 

Languages can be combined and manipulated just like any other sets. Thus, if A and B are 
languages over S, then their union A UB, intersection An B, difference A \ B, and symmetric 
difference A © B are also languages over S, as is the complement A := E* \ A. However, there are 
two more useful operators that are specific to sets of strings. 

The concatenation of two languages A and B, again denoted A • B or just AB, is the set of 
all strings obtained by concatenating an arbitrary string in A with an arbitrary string in B: 

A • B := {xy \ x e A and y e B). 

For example, if A = {HOCUS, ABRACA} and B = {POCUS, DABRA}, then 

A • B = {HOCUSPOCUS, ABRACAPOCUS, HOCUSDABRA, ABRACADABRA}. 

In particular, for every language A, we have 

0'A = A«0 = 0 and {e} • A = A • {s} =A. 

TheKZeene closure or Kleene star 2 of a language L, denoted!*, is the set of all strings obtained 
by concatenating a sequence of zero or more strings from L. For example, {0, 11}* = {s, 0, 00, 11, 
000, 011, 110, 0000, 0011, 0110, 1100, 1111, 00000, 00011, 00110, . .., 011110011011, . . .}. More 
formally, L* is defined recursively as the set of all strings w such that either 

• w = e, or 

• w = xy, for some strings x e L and y e L*. 
This definition immediately implies that 

0* = {£}* = {£}. 

For any other language L, the Kleene closure L* is infinite and contains arbitrarily long (but 
finite!) strings. Equivalently L* can also be defined as the smallest superset of L that contains 
the empty string e and is closed under concatenation (hence "closure"). The set of all strings 
S* is, just as the notation suggests, the Kleene closure of the alphabet S (where each symbol is 
viewed as a string of length 1). 

2 after Stephen Kleene, who pronounced his last name "ciay-knee", not "clean" or "cleanie" or "claynuh" or 
"dimaggio". 
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A useful variant of the Kleene closure operator is the Kleene plus, defined as L + := L • L*. 
Thus, L + is the set of all strings obtained by concatenating a sequence of one or more strings 
from L. 

The following identities, which we state here without (easy) proofs, are useful for designing, 
simplifying, and understanding languages. 

Lemma 2.1. The following identities hold for all languages A, B, and C: 

(a) 0A = A0 = 0. 

(b) eA = Ae=A 

(c) A + B = B+A 

(d) (A + B) + C=A+(B + C). 

(e) {AB)C=A{BC). 

(f) A(B + C)=AB+AC. 

Lemma 2.2. The following identities hold for every language L: 

(a) L* = e + L + = L*L* = (L + e)* = (L \ s)* = e + L + L+L+. 

(b) L + = L*\e = LL* = L*L = L+L* = L*L+ = L + L+L+. 

(c) L + = L* if and only i/seL. 

Lemma 2.3 (Arden's Rule). For any languages A, B, and L such that L = AL + B, we have 
A*B c L. Moreover, if A does not contain the empty string, then L =AL + B if and only if L = A*B. 

2.3 Regular Languages and Regular Expressions 

A language L is regular if and only if it satisfies one of the following (recursive) conditions: 

• L is empty; 

• L contains a single string (which could be the empty string e); 

• L is the union of two regular languages; 

• L is the concatenation of two regular languages; or 

• L is the Kleene closure of a regular language. 

Regular languages are normally described using more compact notation, which omits braces 
around one-string sets, uses + to represent union instead of U, and juxtaposes subexpressions to 
represent concatenation instead of using an explicit operator • ; the resulting string of symbols is 
called a regular expression. By convention, in the absence of parentheses, the * operator has 
highest precedence, followed by the (implicit) concatenation operator, followed by +. Thus, for 
example, the regular expression 10* is shorthand for the language {1} • {0}* (containing all 
strings consisting of a 1 followed by zero or more 0s), and not the language {10}* (containing all 
strings of even length that start with 1 and alternate between 1 and 0) . As a larger example, the 
regular expression 

0 + 0*1(10*1 + 01*0)*10* 

represents the language 

{0}U({0}* • {1} • (({1} • {0}* • {1})U({0} . {1}* . {0}))* . {1} . {0}*). 
Here are a few more examples of regular expressions and the languages they represent. 

• 0* — the set of all strings of 0s, including the empty string. 
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• 00000* — the set of all strings consisting of at least four 0s. 

• (00000)* — the set of all strings of 0s whose length is a multiple of 5. 

• (e + 1)(01)*(£ + 0) — the set of all strings of alternating 0s and Is, or equivalently, the set 
of all binary strings that do not contain the substrings 00 or 11. 

• ((f + 0 + 00 + 000)l)*(c + 0 + 00 + 000) — the set of all binary strings that do not contain 
the substring 0000. 

• ((0 + 1)(0 + 1))* — the set of all binary strings whose length is even. 

. i*(oi*qi*)* — the set of all binary strings with an even number of 0s. 

• 0 + 1(0 + 1)*00 — the set of all non-negative binary numerals divisible by 4 and with no 
redundant leading 0s. 

• 0 + 0*1(10*1 + 01*0)*10* — the set of all non-negative binary numerals divisible by 3, 
possibly with redundant leading 0s. 

The last example should not be obvious. It is straightforward, but rather tedious, to prove 
by induction that every string in 0 + 0*1(10*1 + 01*0)*10* is the binary representation of a 
non-negative multiple of 3. It is similarly straightforward, and similarly tedious, to prove that the 
binary representation of every non-negative multiple of 3 matches this regular expression. In a 
later note, we will see a systematic method for deriving regular expressions for some languages 
that avoids (or more accurately, automates) this tedium. 

Most of the time we do not distinguish between regular expressions and the languages they 
represent, for the same reason that we do not normally distinguish between the arithmetic 
expression "2+2" and the integer 4, or the symbol n and the area of the unit circle. However, we 
sometimes need to refer to regular expressions themselves as strings. In those circumstances, we 
write L(R) to denote the language represented by the regular expression R. String w matches 
regular expression R if and only if w e L(R). Two regular expressions R and R' are equivalent if 
they describe the same language; for example, the regular expressions (0 + 1)* and (1 + 0)* are 
equivalent, because the union operator is commutative. 

Almost every regular language can be represented by infinitely many distinct but equivalent 
regular expressions, even if we ignore ultimately trivial equivalences like L = (L0)*Le + 0. 

2.4 Things What Ain't Regular Expressions 

Many computing environments and programming languages support patterns called regexen 
(singular regex, pluralized like ox) that are considerably more general and powerful than regular 
expressions. Regexen include special symbols representing negation, character classes (for 
example, upper-case letters, or digits), contiguous ranges of characters, line and word boundaries, 
limited repetition (as opposed to the unlimited repetition allowed by *), back-references to earlier 
subexpressions, and even local variables. Despite its obvious etymology, a regex is not necessarily 
a regular expression, and it does not necessarily describe a regular language! 3 

Another type of pattern that is often confused with regular expression are globs, which 
are patterns used in most Unix shells and some scripting languages to represent sets file 
names. Globs include symbols for arbitrary single characters (?), single characters from a 

3 However, regexen are not all-powerful, either; see http://stackoverflow.eom/a/1732454/775369. 
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specified range ([a-z]), arbitrary substrings (*), and substrings from a specified finite set 
({f 00, ba{r, z}}). Globs are significantly less powerful than regular expressions. 

2.5 Not Every Language is Regular 

You may be tempted to conjecture that all languages are regular, but in fact, the following 
cardinality argument almost all languages are not regular. To make the argument concrete, let's 
consider languages over the single-symbol alphabet {o}. 

• Every regular expression over the one-symbol alphabet {0} is itself a string over the 7-symbol 
alphabet {<>,+,(,), *, £,0}. By interpreting these symbols as the digits 1 through 7, we can 
interpret any string over this larger alphabet as the base-8 representation of some unique 
integer. Thus, the set of all regular expressions over {0} is at most as large as the set of 
integers, and is therefore countably infinite. It follows that the set of all regular languages 
over {0} is also countably infinite. 

• On the other hand, for any real number 0 < a < 1, we can define a corresponding language 

L a = {o n I a2"mod 1 > 1/2}. 

In other words, L a contains the string o n if and only if the (n + l)th bit in the binary 
representation of a is equal to 1. For any distinct real numbers a 7^ /3, the binary 
representations of a and /3 must differ in some bit, so L a ^= Lp. We conclude that the set 
of all languages over {0} is at least as large as the set of real numbers between 0 and 1, 
and is therefore uncountably infinite. 

We will see several explicit examples of non-regular languages in future lectures. For example, 
the set of all regular expressions over {0, 1} is not itself a regular language! 
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2.6 Parsing Regular Expressions 



Most algorithms for regular expressions require them in the form of regular expression 
trees, rather than as raw strings. A regular expression tree is one of the following: 

• A leaf node labeled 0. 

• A leaf node labeled with a string in S*. 

• A node labeled + with two children, each the root of an expression tree. 

• A node labeled * with one child, which is the root of an expression tree. 

• A node labeled • with two children, each the root of an expression tree. 

In other words, a regular expression tree directly encodes a sequence of alternation, 
concatenation and Kleene closure operations that defines a regular language. Similarly, when 
we want to prove things about regular expressions or regular languages, it is more natural to 
think of subexpressions as subtrees rather than as substrings. 



Given any regular expression of length n, we can parse it into an equivalent regular 
expression tree in O(n) time. Thus, when we see an algorithmic problem that starts "Given a 
regular expression. . . ", we can assume without loss of generality that we're actually given a 
regular expression tree. 

We'll see more on this topic later. 



1. (a) Prove that {e} • L — L • {e} = L, for any language L. 

(b) Prove that 0 • L = L • 0 = 0, for any language L. 

(c) Prove that (A • B) • C = A • (B • C), for all languages A, B, and C. 

(d) Prove that \A m B \ = |A| • |B|, for all languages A and B. (The second • is multiplication!) 

(e) Prove that L* is finite if and only if L = 0 or L = {e}. 

(f) Prove that AB = BC implies A*B =BC* —A*BC*, for all languages A, B, and C. 

2. Recall that the reversal of a string w is defined recursively as follows: 




Exercises 
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The reversal L of any language L is the set of reversals of all strings in L : 

L R ■= {w R I w e L}. 

(a) Prove that (AB) R = B R A R for all languages A and B. 

(b) Prove that (L K ) R = L for every language L. 

(c) Prove that (L*) R = (L R )* for every language L. 

3. Prove that each of the following regular expressions is equivalent to (0 + 1)*. 

(a) e + O(0+ 1)* + 1(1 + 0)* 

(b) 0* + 0*1(0+1)* 

(c) ((e + 9)(e + l))* 

(d) 0*(1O*)* 

(e) (1*0)*(0*1)* 

4. For each of the following languages in {0, 1}*, describe an equivalent regular expression. 
There are infinitely many correct answers for each language. (This problem will become 
significantly simpler after we've seen finite-state machines, in the next lecture note.) 

(a) Strings that end with the suffix 0 9 = 000000000. 

(b) All strings except 010. 

(c) Strings that contain the substring 010. 

(d) Strings that contain the subsequence 010. 

(e) Strings that do not contain the substring 010. 

(f) Strings that do not contain the subsequence 010. 

(g) Strings that contain an even number of occurrences of the substring 010. 

*(h) Strings that contain an even number of occurrences of the substring 000. 

(i) Strings in which every occurrence of the substring 00 appears before every occurrence 
of the substring 11. 

(j) Strings w such that in every prefix ofw, the number of 0s and the number of Is differ 
by at most 1. 

*(k) Strings w such that in every prefix ofw, the number of 0s and the number of Is differ 
by at most 2. 

*(1) Strings in which the number of 0s and the number of Is differ by a multiple of 3. 
*(m) Strings that contain an even number of Is and an odd number of 0s. 
* (n) Strings that represent a number divisible by 5 in binary. 

5. Prove that for any regular expression R such that L(R) is nonempty, there is a regular 
expression equivalent to R that does not use the empty-set symbol 0. 
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6. Prove that if L is a regular language, then L R is also a regular language. [Hint: How do 
you reverse a regular expression?] 

7. (a) Describe and analyze an efficient algorithm to determine, given a regular expression R, 

whether L(R) is empty. 

(b) Describe and analyze an efficient algorithm to determine, given a regular expression R, 
whether L(R) is infinite. 

In each problem, assume you are given R as a regular expression tree, not just a raw string. 
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Caveat lector! This is the first edition of this lecture note. A few topics are missing, and 
there are almost certainly a few serious errors. Please send bug reports and suggestions to 
jeffe@illinois.edu. 



Life only avails, not the having lived. Power ceases in the instant of repose; 
it resides in the moment of transition from a past to a new state, 
in the shooting of the gulf, in the darting to an aim. 

— Ralph Waldo Emerson, "Self Reliance", Essays, First Series (1841) 

0 Marvelous! what new configuration will come next? 

1 am bewildered with multiplicity. 

— William Carlos Williams, "At Dawn" (1914) 



3 Finite-State Machines 
3.1 Intuition 

Suppose we want to determine whether a given string w[l .. n] of bits represents a multiple of 5 
in binary. After a bit of thought, you might realize that you can read the bits in w one at a time, 
from left to right, keeping track of the value modulo 5 of the prefix you have read so far. 

MultipleOf5(w[1 .. n]): 
rem <— 0 
for i <— 1 to n 

rem <— (2 • rem + w[i]) mod 5 
if rem = 0 

return True 

else 

return False 

Aside from the loop index i, which we need just to read the entire input string, this algorithm 
has a single local variable rem, which has only four different values (0, 1, 2, 3, or 4). 

This algorithm already runs in 0(n) time, which is the best we can hope for — after all, we 
have to read every bit in the input — but we can speed up the algorithm in practice. Let's define a 
change or transition function 5 : {0, 1, 2, 3, 4} x {0, 1} — > {0, 1, 2, 3, 4} as follows: 

5(q, a) = (2q + a) mod 5. 

(Here I'm implicitly converting the symbols 0 and 1 to the corresponding integers 0 and 1.) Since 
we already know all values of the transition function, we can store them in a precomputed table, 
and then replace the computation in the main loop of MultipleOfs with a simple array lookup. 

We can also modify the return condition to check for different values modulo 5. To be 
completely general, we replace the final if-then-else lines with another array lookup, using an 
array A[0..4] of booleans describing which final mod-5 values are "acceptable". 

After both of these modifications, our algorithm can be rewritten as follows, either iteratively 
or recursively (with q = 0 in the initial call) : 

© Copyright 2014 Jeff Erickson. 
This work is licensed under a Creative Commons License (http://creativecommons.0rg/licenses/by-nc-sa/4.O/). 
Free distribution is strongly encouraged; commercial distribution is expressly forbidden. 
See http://www.cs.uiuc.edu/-jeffe/teaching/algorithms/ for the most recent revision. 
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DoSomethingCool(w[1 ..n]): 
q^O 

for i <— 1 to n 

q <- 5[q,w[i]] 
return A[q] 



DoSOMETHINGCOQL(q, w): 

if w = e 

return A[q] 

else 

decompose w = a- x 

return DoSoMETHiNGCooL(5(q,a),x) 



If we want to use our new DoSomethingCool algorithm to implement MultipleOfs, we simply 
give the arrays 5 and A the following hard-coded values: 



q 


5[q,Q] 


5[q,l] 


A[q] 


0 


0 


1 


True 


1 


2 


3 


False 


2 


4 


0 


False 


3 


1 


2 


False 


4 


3 


4 


False 



We can also visualize the behavior of DoSomethingCool by drawing a directed graph, whose 
vertices represent possible values of the variable q — the possible states of the algorithm — and 
whose edges are labeled with input symbols to represent transitions between states. Specifically, 
the graph includes the labeled directed edge p — *q if and only if 5(p, a) = q. To indicate the 
proper return value, we draw the "acceptable" final states using doubled circles. Here is the 
resulting graph for MultipleOfs: 




State-transition graph for MultipleOfs 

If we run the MultipleOfs algorithm on the string 00101110110 (representing the number 
374 in binary), the algorithm performs the following sequence of transitions: 



0 0 1 

0^0 — >0 — >1 



— >2 — >0 — > 1 — >3 — 



1^3^2^4 



Because the final state is not the "acceptable" state 0, the algorithm correctly returns False. 
We can also think of this sequence of transitions as a walk in the graph, which is completely 
determined by the start state 0 and the sequence of edge labels; the algorithm returns True if 
and only if this walk ends at an "acceptable" state. 



3.2 Formal Definitions 

The object we have just described is an example of a finite-state machine. A finite-state machine 
is a formal model of any system/machine/algorithm that can exist in a finite number of states 
and that transitions among those states based on sequence of input symbols. 

Finite-state machines are also commonly called deterministic finite-state automata, abbre- 
viated DFAs. The word "deterministic" means that the behavior of the machine is completely 
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determined by the input string; we'll discuss nondeterministic automata in the next lecture. 
The word "automaton" (plural "automata") comes from ancient Greek avTo^aioq meaning 
"self-acting", from the roots avzo- ("self") and -^laioq ("thinking, willing", the root of Latin 
mentus). 

Formally, every finite-state machine consists of five components: 

• An arbitrary finite set S, called the input alphabet. 

• Another arbitrary finite set Q, whose elements are called states. 

• An arbitrary transition function 5 : Q x S — > Q. 

• A start state seQ. 

• A subset AQQ of accepting states. 

The behavior of a finite-state machine is governed by an input string w, which is a finite 
sequence of symbols from the input alphabet S. The machine reads the symbols in w one at a 
time in order (from left to right). At all times, the machine has a current state q; initially q is 
the machine's start state s. Each time the machine reads a symbol a from the input string, its 
current state transitions from q to 5(q, a). After all the characters have been read, the machine 
accepts w if the current state is in A and rejects w otherwise. In other words, every finite state 
machine runs the algorithm DoSomethingCool! The language of a finite state machine M, 
denoted I(M) is the set of all strings that M accepts. 

More formally, we extend the transition function 5 : Q x £* — > Q of any finite-state machine 
to a function 5* : Q x S* — > Q that transitions on strings as follows: 



Finally, a finite-state machine accepts a string w if and only if 5*(s,w) e A, and rejects w 
otherwise. (Compare this definition with the recursive formulation of DoSomethingCool!) 
For example, our final MultipleOfs algorithm is a DFA with the following components: 

• input alphabet: E = {0, 1} 

• state set: Q = {0, 1, 2, 3, 4} 

• transition function: 5(q, a) = (2q + a) mod 5 

• start state: 5 = 0 

• accepting states: A = {0} 

This machine rejects the string 00101110110, because 



<5*(0, 00101110110) = 5*(5(0, 0), 0101110110) 
= 5*(0, 0101110110) = <5*(5(0, 0), 101110110) 
= <5*(0, 101110110) = <5*(5(0, 1), 01110110) = • • • 

• • • = <5*(1, 110) = 5*(<5(1, 1), 10) 
= 5*(3, 10) = S*(5(3, 1),0) 
= 5*(2,0) = S*(<5(3, 0),e) 
= 5*(4, e) = 4f*A. 
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We have already seen a more graphical representation of this entire sequence of transitions: 




The arrow notation is easier to read and write for specific examples, but surprisingly, most people 
actually find the more formal functional notation easier to use in formal proofs. Try them both! 

We can equivalently define a DFA as a directed graph whose vertices are the states Q, whose 
edges are labeled with symbols from S, such that every vertex has exactly one outgoing edge 
with each label. In our drawings of finite state machines, the start state 5 is always indicated 
by an incoming arrow, and the accepting states A are always indicted by doubled circles. By 
induction, for any string w e S*, this graph contains a unique walk that starts at s and whose 
edges are labeled with the symbols in w in order. The machine accepts w if this walk ends at an 
accepting state. This graphical formulation of DFAs is incredibly useful for developing intuition 
and even designing DFAs. For proofs, it's largely a matter of taste whether to write in terms of 
extended transition functions or labeled graphs, but (as much as I wish otherwise) I actually find 
it easier to write correct proofs using the functional formulation. 



3.3 Another Example 

The following drawing shows a finite-state machine with input alphabet S = {0, 1}, state set 
Q = {s, t}, start state s, a single accepting state t, and the transition function 

5(s, 0) = s, 5(5, l) = t, 5(t,0) = t, <5(t, 1)=5. 

1 

A simple finite-state machine. 

For example, the two-state machine M at the top of this page accepts the string 00101110100 
after the following sequence of transitions: 

0 0 i^_0^i 1^1 0 1 0 0 
The same machine M rejects the string 11100101 after the following sequence of transitions: 

11 1 ^ 0 ^ 0 10 1 

Finally, M rejects the empty string, because the start state s is not an accepting state. 

From these examples and others, it is easy to conjecture that the language of M is the set of 
all strings of 0s and Is with an odd number of Is. So let's prove it! 

Proof (tedious case analysis): Let #(a, w) denote the number of times symbol a appears in 
string w. We will prove the following stronger claims, for any string w. 

I s if #(1, w) is even _ „ ft if #(1, w) is even 

8*(s,w) = \ and 5*(t,w)=\ 

\t if #(l,w)isodd [5 if #(i,w) is odd 

Let w be an arbitrary string. Assume that for any string x that is shorter than w, we have 
<5*(s, x) = s and <5*(t, x) = t if x has an even number of Is, and <5*(s, x) = t and 5*(t, x) = s if 
x has an odd number of Is. There are five cases to consider. 
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• If w = e, then w contains an even number of Is and 5*(s,w) = s and 5*(t,w) = t by 
definition. 

• Suppose w = lx and #(1, w) is even. Then #(1, x) is odd, which implies 

<5*(s, w) = <5*(<5(s, 1), x) by definition of 5* 

= 5*(t, x) by definition of 5 

= s by the inductive hypothesis 

5*(t, w) = <5*(<5(t, 1), x) by definition of 5* 

= 5*(s, x) by definition of 5 

— T by the inductive hypothesis 

Since the remaining cases are similar, I'll omit the line-by-line justification. 

• If w = lx and #(1, w) is odd, then #(1, x) is even, so the inductive hypothesis implies 

5%s, w) = 5*(5(s, 1), x) = 5%t, x) = t 
5*(t,w) = 5*(5(t, l),x) = 5*(s,x) = s 

• If w = Ox and #(1, w) is even, then #(1, x) is even, so the inductive hypothesis implies 

5%s, w) = 5*(5(s, 0), x) = 5%s, x) = s 
5*(t,w) = 5*(5(t,0),x) = 5*(t,x) = t 

• Finally, if w = Ox and #(1, w) is odd, then #(l,x) is odd, so the inductive hypothesis 
implies 

5*( 5; w ) = 5*(5(s, 0), x) = <5*(s, x) = t 

5*(t,w) = 5*(5(t,Q),x) = 5*(t,x) = s □ 

Notice that this proof contains |Q| 2 • |S| + |Q| separate inductive arguments. For every pair of 
states p and q, we must argue about the language so strings w such that 5*(p, w) = q, and we 
must consider each first symbol in w. We must also argue about 5(p, e) for every state p. Each of 
those arguments is typically straightforward, but it's easy to get lost in the deluge of cases. 

For this particular proof, however, we can reduce the number of cases by switching from tail 
recursion to head recursion. The following identity holds for all strings x e S* and symbols 
a e S: 



5%q,xa) = 5(5*(q,x),a) 



We leave the inductive proof of this identity as a straightforward exercise (hint, hint). 

Proof (clever renaming, head induction): Let's rename the states 0 and 1 instead of 5 and t. 
Then the transition function can be described concisely as 5(q, a) = (q + a) mod 2. 

Now we claim that for every string w, we have 5*(0, w) = #(1, w) mod 2. So let w be 
an arbitrary string, and assume that for any string x that is shorter than w that 5*(0, x) = 
#(1, x) mod 2. There are only two cases to consider: either w is empty or it isn't. 

• If w = e, then <5*(0, w) = 0 = #(1, w) mod 2 by definition. 
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• Otherwise, w 



5*(0,w) 



xa for some string x and some symbol a, and we have 
5(5*(0,x),a) 



<5(#(1, x) mod 2, a) by the inductive hypothesis 

mod 2 + a) mod 2 by definition of 5 

(#( 1, x) + a) mod 2 by definition of mod 2 

(#(1, x) + #(1, a)) mod 2 because #(1, 0) = 0 and #(1, 1) = 1 

(#( 1, xa)) mod 2 by definition of # 

(#(l,w))mod2 because w = xa 



□ 



Hmmm. This "clever" proof is certainly shorter than the earlier brute-force proof, but is it really 
"better"? "Simpler"? More intuitive? Easier to understand? I'm skeptical. Sometimes brute force 
really is more effective. 

3.4 Yet Another Example 

As a more complex example, consider the Rubik's cube, a well-known mechanical puzzle invented 
independently by Ern Rubik in Hungary and Terutoshi Ishigi in Japan in the mid-1970s. This 
puzzle has precisely 519,024,039,293,878,272,000 distinct configurations. In the unique solved 
configuration, each of the six faces of the cube shows exactly one color. We can change the 
configuration of the cube by rotating one of the six faces of the cube by 90 degrees, either clockwise 
or counterclockwise. The cube has six faces (front, back, left, right, up, and down), so there 
are exactly twelve possible turns, typically represented by the symbols R, L, F, B, U, D, R, L, F, B, 0, D, 
where the letter indicates which face to turn and the presence or absence of a bar over the letter 
indicates turning counterclockwise or clockwise, respectively. Thus, we can represent a Rubik's 
cube as a finite-state machine with 519,024,039,293,878,272,000 states and an input alphabet 
with 12 symbols; or equivalently as a directed graph with 519,024,039,293,878,272,000 vertices, 
each with 12 outgoing edges. In practice, the number of states is far too large for us to actually 
draw the machine or explicitly specify its transition function; nevertheless, the number of states 
is still finite. If we let the start state 5 and the sole accepting state be the solved state, then 
the language of this finite state machine is the set of all move sequences that leave the cube 
unchanged. 



3.5 Building DFAs 

This section describes a few examples of building DFAs that accept particular languages, thereby 
proving that those languages are automatic. As usual in algorithm design, there is no purely 




A complicated finite-state machine. 
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mechanical recipe — no automatic method — no algorithm — for building DFAs in general. However, 
the following examples show several useful design strategies. 

3.5.1 Superstrings 

Perhaps the simplest rule of thumb is to try to construct an algorithm that looks like MultipleOfs: 
A simple for-loop through the symbols, using a constant number of variables, where each variable 
(except the loop index) has only a constant number of possible values. Here, "constant" means 
an actual number that is not a function of the input size n. You should be able to compute the 
number of possible values for each variable at compile time. 

For example, the following algorithm determines whether a given string in S = {0, 1} contains 
the substring 11. 



CONTAINSll(w[l .. 


n]): 


found <— False 




for i <— 1 to n 




if i = 1 




last2 <— 


w[l] 


else 




last2 <— 


w[l]-w[2] 


if last = 11 




found <— True 


return found 





Aside from the loop index, this algorithm has exactly two variables. 

• A boolean flag found indicating whether we have seen the substring 11. This variable has 
exactly two possible values: True and False. 

• A string last2 containing the last (up to) three symbols we have read so far. This variable 
has exactly 7 possible values: e, 0, 1, 00, 01, 10, and 11. 

Thus, altogether, the algorithm can be in at most 2 x 7 = 14 possible states, one for each possible 
pair (found, last2). Thus, we can encode the behavior of Containsii as a DFA with fourteen 
states, where the start state is (False, e) and the accepting states are all seven states of the form 
(True, *). The transition function is described in the following table (split into two parts to save 
space): 



q 


5[q,Q] 


5[q,l] 


<Z 


5[q,0] 


5[q,l] 


(False, e) 
(False, 0) 
(False, 1) 


(False, 0) 
(False, 00) 
(False, 10) 


(False, 1) 
(False, 01) 
(True, 11) 


(True, e) 
(True, 0) 
(True, 1) 


(True, 0) 
(True, 00) 
(True, 10) 


(True, 1) 
(True, 01) 
(True, 11) 


(False, 00) 
(False, 01) 
(False, 10) 
(False, 11) 


(False, 00) 
(False, 10) 
(False, 00) 
(False, 10) 


(False, 01) 
(True, 11) 
(False, 01) 
(True, 11) 


(True, 00) 
(True, 01) 
(True, 10) 
(True, 11) 


(True, 00) 
(True, 10) 
(True, 00) 
(True, 10) 


(True, 01) 
(True, 11) 
(True, 01) 
(True, 11) 



For example, given the input string lOOlOHlOO, this DFA performs the following sequence of 
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transitions and then accepts. 

(False, e) — U (False, 1) -^U (False, 10) -^-> (False, 00) — U 

(False, 01) (False, 10) (False, 01) — U 

(True, 11) -U (True, 11) (True, 10) (True, 00) 

3.5.2 Reducing states 

You can probably guess that the brute-force DFA we just constructed has considerably more states 
than necessary, especially after seeing its transition graph: 




Our brute-force DFA for strings containing the substring 11 



For example, we don't need actually to remember both of the last two symbols, but only the 
penultimate symbol, because the last symbol is the one we're currently reading. This observation 
allows us to reduce the number of states from fourteen to only six. Once the flag part of the state 
is set to True, we know the machine will eventually accept, so we might as well merge the two 
accepting states together. Finally, and more subtly, because all transitions out of (False, e) and 
(False, 0) are identical, we can merge those two states together as well. In the end, we obtain 
the following DFA with just three states: 

• The start state, which indicates that the machine has not read the substring 11 an did not 
just read the symbol 1. 

• An intermediate state, which indicates that the machine has not read the substring 11 but 
just read the symbol 1. 

• A unique accept state, which indicates that the machine has read the substring 11. 




A minimal DFA for superstrings of 11 



At the end of this note, I'll describe an efficient algorithm to transform any given DFA into an 
equivalent DFA with the fewest possible states. Given that this minimization algorithm exists, 
there is very little incentive to optimize DEAs by hand. Clarity is infinitely more important than 
brevity, especially in this class. 
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3.5.3 Every this after that 

Suppose we want to accept the set of strings in which every occurrence of the substring 00 occurs 
after every occurrence of the substring 11. Equivalently we want to reject every string in which 
some 00 occurs before 11. Often the easiest way to design a DFA to check whether a string is not 
in some set is first to build a DFA that is in that set and then invert which states in that machine 
are accepting. 

From the previous example, we know that there is a three-state DFA M n that accepts the set 
of strings with the substring 11 and a nearly identical DFA M 00 that accepts the set of strings 
containing the substring 00. By identifying the accept state of M 00 with the start state of M n , 
we obtain a five-state DFA that accepts the set of strings with 00 before 11. Finally, by inverting 
which states are accepting, we obtain the DFA we want. 




Building a DFA for the language of strings in which every 00 is after every 11. 



3.5.4 Both This and That: The Product Construction 

Now suppose we want to accept all strings that contain both 00 and 11 as substrings, in either 
order. Intuitively, we'd like to run two of our earlier DFAs in parallel — the DFA M 00 to detect 
superstrings of 00 and the DFA M n to detect superstrings of 11 — and then accept the input 
string if and only if both of these DFAs accept. In fact, we can encode precisely this "parallel 
computation" into a single DFA, whose states are all ordered pairs (p,q), where p is a state in 
M 0O and q is a state in M n . The new "parallel" DFA includes the transition (p,q) — — » (p',q') if 
and only if M 0Q contains the transition p — > p' and M n contains the transition q — > q' . Finally, 
the state (p, q) is accepting if and only if p and q are accepting states in their respective machines. 
The resulting nine-state DFA is shown on the next page. 

More generally, let Mj = (S,Q 1; 5 1 ,s 1 ,A 1 ) be an arbitrary DFA that accepts some language 
Li, and let M 2 = (£, Q 2 , 5 2 ,s 2 , A 2 ) be an arbitrary DFA that accepts some language L 2 (over the 
same alphabet S). We can construct a third DFA M = (£, Q, 5, s, A) that accepts the intersection 
language L x n L 2 as follows. 

Q:=Qi xQ 2 = {(p,q) | p€Q a and q eQ 2 } 
s :=(si,s 2 ) 

A:=A 1 xA 2 = {(p,q) | p &A 1 and qeA 2 } 
5((p,q),a) := (S^p.a), <5 2 (q,a)) 
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.0,1. 




m 

V0,1 

Building a DFA for the language of strings in which every 00 is after every 11. 



To convince yourself that this product construction is actually correct, consider the extended 
transition function 5*: (Q x Q') x £* — > (Q x Q'), which acts on strings instead of individual 
symbols. Recall that this function is defined recursively as follows: 



5*((p,q),w):= 



q if w = e, 

<5*(5((p,q), a), x) ifw = ax. 



Inductive definition-chasing gives us the identity 5*((p,q), w) = (<5^(p,w), S^Cq.w)) for any 
string w: 



<5*((p,q), e) = (p,q) 

= (5*(p,e), 5*(q,e)) 

5*((p,q),ax) = 5*(5((p,q),a), x) 

= 5*((5i(p,a), <5 2 (q,a)), x) 

= {5* 1 ((5 1 (p,a),x), S*(5 2 (q,a),x)) 

= (<5*(p,ax), 5*(q,ax)) 



by the definition of 5* 
by the definitions of 5^ and 5* 2 ; 

by the definition of 5* 
by the definition of 5 
by the induction hypothesis 
by the definitions of 5^ and 5^. 



It now follows from this seemingly impenetrable wall of notation that for any string w, we have 
5*(s, w) e A if and only if both 5^(s 1 , w) € A 1 and 5*(s 2 , w) e A 2 . In other words, M accepts w if 
and only if both M 1 and M 2 accept w, as required. 

As usual, this construction technique does not necessarily yield minimal DFAs. For example, 
in our first example of a product DFA, illustrated above, the central state (a, a) cannot be reached 
by any other state and is therefore redundant. Whatever. 

Similar product constructions can be used to build DFAs that accept any other boolean 
combination of languages; in fact, the only part of the construction that needs to be changed is 
the choice of accepting states. For example: 

• To accept the union L 1 U L 2 , define A= {(p,q) \ p e A : or q eA 2 }. 



10 



Models of Computation 



Lecture 3: Finite-State Machines [Fa'14] 



• To accept the difference L x \ L 2 , define A= {(p,q) | p €Aj but not q ^A 2 }. 

• To accept the symmetric difference L l ffi L 2 , define A = {(p, q) | p e A x xor q e A 2 }. 

Moreover, by cascading this product construction, we can construct DFAs that accept arbitrary 
boolean combinations of arbitrary finite collections of regular languages. 

3.6 Decision Algorithms 



It's unclear how much we can say here, since we haven't yet talked about graph algorithms, 
or even really about graphs. Perhaps this discussion should simply be moved to the graph- 
traversal notes. 

• Is w e L(M)? Follow the unique path from q 0 with label w. By definition, w e L(M) if 
and only if this path leads to an accepting state. 

• Is L(M) empty? The language L(M) is empty if and only if no accepting state is 
reachable from q 0 . This condition can be checked in O(n) time via whatever-first search, 
where n is the number of states. Alternatively, but less usefully, L(M) = 0 if and only if 
L(M) contains no string w such that |w| < n. 

• Is L(M) finite? Remove all states unreachable from q 0 (via whatever first search). 
Then L(M) is finite if and only if the reduced DFA is a dag; this condition can be checked 
by depth-first search. Alternatively, but less usefully, L(M) is finite if and only if L(M) 
contains no string w such that n < \w\ < In. 

• Is L(M) = S*? Remove all states unreachable from q 0 (via whatever first search). Then 
L(M) = S* if and only if every state in M is an accepting state. 

• Is L(M) = L{M')? Build a DFA JV such that L(JV) = L(M) \ L(M') using a standard 
product construction, and then check whether L(N) = 0. 



3.7 Closure Properties 



*** 



We haven't yet proved that automatic languages are regular yet, so formally, for now, some 
of these are closure properties of automatic languages. 

• Complement (easy for DFAs, hard for regular expressions.) 

• Concatenation (trivial for regular expressions, hard for DFAs) 

• Union (trivial for regular expressions, easy for DFAs via product) 

• Intersection (hard for regular expressions, easy for DFAs via product) 

• Difference (hard for regular expressions, easy for DFAs via product) 

• Kleene star: wait for NFAs (trivial for regular expression, hard for DFAs) 

• Homomorphism: only mention in passing 

• Inverse homomorphism: only mention in passing 



3.8 Fooling Sets 

Fix an arbitrary language L over an arbitrary alphabet S. For any strings x,y,z e E*, we say that 
z distinguishes x from y if exactly one of the strings xz and yz is in L. If no string distinguishes 
x and y, we say that x and y are L-equivalent and write x = L y. Thus, 

x = L y <=> For every string z e E*, we have xz e L if and only if yz e L. 
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For example, let L eo denote the language of strings over {0, 1} with an even number of 0s 
and an odd number of Is. Then the strings x = 01 and y = 0011 are distinguished by the string 
z = 100, because 

xz = 01 • 100 = 01100 eL eo 
yz = 0011 • 100 = 0011100 £ L eo . 

On the other hand, it is quite easy to prove (hint, hint) that the strings 0001 and 1011 are 
L eo -equivalent. 

Let M be an arbitrary DFA for an arbitrary language L, and let x be y be arbitrary strings. If 
x and y lead to the same state in M — that is, if 5*(s, x) = 5*(s,y) — then we have 

5%s,xz) = 5*(5*(s,x),z) = 5*(5*(s, y),z) = 5*(s,yz) 

for any string z. In particular, either M accepts both x and y, or M rejects both x and y, and 
therefore x = L y. It follows that if x and y are not L-equivalent, then any DFA that accepts L 
has at least two distinct states 5*(s,x) 7^ 5*(s, y). 

Finally, a fooling set for L is a set F of strings such that every pair of strings in F has a 
distinguishing suffix. For example, F = {01, 101, 010, 1010} is a fooling set for the language L eo 
of strings with an even number of 0s and an odd number of Is, because each pair of strings in F 
has a distinguishing suffix: 

• 0 distinguishes 01 and 101; 

• 0 distinguishes 01 and 010; 

• 0 distinguishes 01 and 1010; 

• 10 distinguishes 101 and 010; 

• 1 distinguishes 101 and 1010; 

• 1 distinguishes 010 and 1010. 

The pigeonhole principle now implies that for any integer k, if language L is accepted by a DFA 
with k states, then every fooling set for L contains at most k strings. This simple observation has 
two immediate corollaries. 

First, for any integer k, if L has a fooling set of size k, then every DFA that accepts L has at 
least k states. For example, the fooling set {01, 101, 010, 1010} proves that any DFA for L eo has at 
least four states. Thus, we can use fooling sets to prove that certain DEAs are as small as possible. 

Second, and more interestingly, if a language L is accepted by any DFA, then every fooling set 
for L must be finite. Equivalently: 

IfL has an infinite fooling set, then L is not accepted by any DFA. 

This is arguably both the simplest and most powerful method for proving that a language is 
non-regular. Here are a few canonical examples of the fooling-set technique in action. 

Lemma 3.1. The language L = {0 n l" | n > 0} is not regular. 

Proof: Consider the set F = {0" | n > 0}, or more simply F = 0*. Let x and y be arbitrary 
distinct strings in F . Then we must have x = 0 1 and y = Q J for some integers t 7^ j. The suffix 
z = l l distinguishes x and y, because xz = 0 l l ! e L, but yz = 0 t l J ^ L. We conclude that F is a 
fooling set for L. Because F is infinite, L cannot be regular. □ 
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Lemma 3.2. The language L = {ww weE*} of even-length palindromes is not regular. 

Proof: Let x and y be arbitrary distinct strings in 0*1. Then we must have x = 0 l l and y = 0 J 1 
for some integers i 7^ j. The suffix z = 10 l distinguishes x and y, because xz = O'llO 1 e L, but 
yz = O'llO-' ^ L. We conclude that 0*1 is a fooling set for L. Because 0*1 is infinite, L cannot be 
regular. □ 

Lemma 3.3. The language L = {0 2 " | n > 0} is not regular. 

Proof: Let x and y be arbitrary distinct strings in L. Then we must have x = Q 2 ' and y = 0 2J for 
some integers i 7^ j. The suffix z = 0 2 distinguishes x and y, because xz = 0 2 +2 = 0 2 e L, 
but yz = 0 2 ' +2J ^ L. We conclude that L itself is a fooling set for L. Because L is infinite, L 
cannot be regular. □ 

Lemma 3.4. The language L = {0 P | p is prime} is not regular. 

Proof: Again, we use 0* as our fooling set, but but the actual argument is somewhat more 
complicated than in our earlier examples. 

Let x and y be arbitrary distinct strings in 0*. Then we must have x = 0 ! and y = Q J for some 
integers i 7^ j. Without loss of generality, assume that i < j. Let p be any prime number larger 
than i. Because p + 0(j — i) is prime and p + p(j — i) > p is not, there must be a positive integer 
k < p such that p + (fc — — i) is prime but p + k(j — i) is not. Then the suffix QP+(.k-Dj-ki 
distinguishes x and y: 

xz = 0* eP+(*-i)J-H = 0 P+(fc-i)U-O 6 i because p + (k- l)fj - i) is prime; 

yz = 0 J ' oP+( k -Vj- ki = eP+*0-0 £ £ because p + fcfj - 0 is not prime. 

(Because i < j and i < p, the suffix 0i' + ( k-:L )-' -fci = Q(p-0+(fc-i)0'-0 has positive length and 
therefore actually exists!) We conclude that 0* is indeed a fooling set for L, which implies that L 
is not regular. □ 

One natural question that many students ask is "How did you come up with that fooling set?" 
Perhaps the simplest rule of thumb is that for most languages L — in particular, for almost all 
languages that students are asked to prove non-regular on homeworks or exams — either some 
simple regular language like 0* or 10*1 is a fooling set, or the language L itself is a fooling set. 
(Of course, there are well-engineered counterexamples.) 



*3-9 The Myhill-Nerode Theorem 

The fooling set technique implies a necessary condition for a language to be accepted by a 
DFA — the language must have no infinite fooling sets. In fact, this condition is also sufficient. 
The following powerful theorem was first proved by Anil Nerode in 1958, strengthening a 1957 
result of John Myhill. 1 

The Myhill-Nerode Theorem. For any language L, the following are equal: 

1 Myhill considered the finer equivalence relation x ~ L y, meaning wxz e L if and only if wyz e L for all strings 
w and z, and proved that L is regular if and only if ~ L defines a finite number of equivalence classes. Like most 
of Myhill's early automata research, this result appears in an unpublished Air Force technical report. The modern 
Myhill-Nerode theorem appears (in an even more general form) as a minor lemma in Nerode's 1958 paper, which (not 
surprisingly) does not cite Myhill. 
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(a) the minimum number of states in a DFA that accepts L, 

(b) the maximum size of a fooling set for L, and 

(c) the number of equivalence classes of= L . 

In particular, L is accepted by a DFA if and only if every fooling set for L is finite. 

Proof: Let L be an arbitrary language. 

We have already proved that the size of any fooling set for L is at most the number of states 
in any DFA that accepts L, so (a)<(b). It also follows directly from the definitions that F c £* is 
a fooling set for L if and only if F contains at most one string in each equivalence class of = L ; 
thus, (b) = (c). We complete the proof by showing that (a)>(c). 

We have already proved that if = L has an infinite number of equivalence classes, there is no 
DFA that accepts L, so assume that the number of equivalence classes is finite. For any string w, 
let [w] denote its equivalence class. We define a DFA M= = (£,Q,s,A, 5) as follows: 



Q 


:={[w] 


we S*} 


s 


:=[*] 




A 


:={[w] 


w e l) 


5([w],a) 


:= [w • a] 



We claim that this DFA accepts the language L; this claim completes the proof of the theorem. 

But before we can prove anything about this DFA, we first need to verify that it is actually 
well-defined. Let x and y be two strings such that [x] = [y]. By definition of L -equivalence, 
for any string z, we have xz e L if and only if yz e L. It immediately follows that for any 
symbol a e £ and any string z' , we have xaz' € L if and only if yaz' e L. Thus, by definition of 
L-equivalence, we have [xa] = [ya] for every symbol a € £. We conclude that the function 5 is 
indeed well-defined. 

An easy inductive proof implies that 5*([e],x) = [x] for every string x. Thus, M accepts 
string x if and only if [x] = [w] for some string w e L. But if [x] = [w], then by definition 
(setting z = e), we have x e L if and only if w e L. So M accepts x if and only if x e L. In other 
words, M accepts L, as claimed, so the proof is complete. □ 

*3.io Minimal Automata 

Given a DFA M = (S, Q,s,A, 5), suppose we want to find another DFA M' = (S, Q',s',A', 5') with 
the fewest possible states that accepts the same language. In this final section, we describe 
an efficient algorithm to minimize DFAs, first described (in slightly different form) by Edward 
Moore in 1956. We analyze the running time of Moore's in terms of two parameters: n = |Q| and 

C7=|E|. 

In the preprocessing phase, we find and remove any states that cannot be reached from the 
start state 5; this filtering can be performed in O(ncr) time using any graph traversal algorithm. 
So from now on we assume that all states are reachable from s. 

Now define two states p and q in the trimmed DFA to be distingusiable, written p q , if at 
least one of the following conditions holds: 

• p eA and q <£A, 

• p<fcA and q e A, or 

• 5(p, a) £ <5(q, a) for some a e S. 
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Equivalently, p ^ q if and only if there is a string z such that exactly one of the states 5*(p,z) 
and 5*(q,z) is accepting. (Sound familiar?) Intuitively the main algorithm assumes that all 
states are equivalent until proven otherwise, and then repeatedly looks for state pairs that can be 
proved distinguishable. 

The main algorithm maintains a two-dimensional table, indexed by the states, where 
Dist[p,q] = True indicates that we have proved states p and q are distinguishable. Initially, for all 
states p and q, we set Dist[p, q] <— True if p e A and q ^ A or vice versa, and Dist[p,q] = False 
otherwise. Then we repeatedly consider each pair of states and each symbol to find more 
distinguished pairs, until we make a complete pass through the table without modifying it. The 
table-filling algorithm can be summarized as follows: 2 



MinDFATable(S, Q,s,A, 5): 
for all p e Q 
for all q e Q 

if (p e A and q A) or (p ^ A and q e A) 
Dist[p,q] <— True 

else 

Dist[p,q] <— False 
notdone <— True 
while notdone 

notdone <— False 
for all p e Q 
for all q e Q 

if Dist[p, q] = False 
for all a e E 

ifDist[5(p,a),5(q,a)] 
Dist[p,q] <— True 
notdone <— True 

return Dist 



*** 



The algorithm must eventually halt, because there are only a finite number of entries in the 
table that can be marked. In fact, the main loop is guaranteed to terminate after at most n 
iterations, which implies that the entire algorithm runs in 0((7n 3 ) time. Once the table is filled, 
any two states p and q such that Dist(p,q) = False are equivalent and can be merged into a 
single state. The remaining details of constructing the minimized DFA are straightforward. 



Need to prove that the main loop terminates in at most n iterations. 



With more care, Moore's minimization algorithm can be modified to run in 0(crn 2 ) time. A 
faster DFA minimization algorithm, due to John Hopcroft, runs in O(crnlogn) time. 



2 More experienced readers should become queasy at the mere suggestion that any algorithm merely fills in a table, 
as opposed to evaluating a recurrence. This algorithm is no exception. Consider the boolean function Dist(p,q, fc), 
which equals True if and only if p and q can be distinguished by some string of length at most fc. This function obeys 
the following recurrence: 



Dist(p,q, fc) = 



[(peA)e(qeA) iffc = 0, 

I Dist{p, q, k - 1) V \J Dist(5{p, a), 5(q, a), k - l) otherwise. 



Moore's "table-filling" algorithm is just a space-efficient dynamic programming algorithm to evaluate this recurrence. 
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Example 

To get a better idea how this algorithm works, let's visualize the algorithm running on our 
earlier brute-force DFA for strings containing the substring 11. This DFA has four unreachable 
states: (False, 11), (True, e), (True, 0), and (True, 1). We remove these states, and relabel the 
remaining states for easier reference. (In an actual implementation, the states would almost 
certainly be represented by indices into an array anyway, not by mnemonic labels.) 




Our brute-force DFA for strings containing the substring 11, after removing all four unreachable states 

The main algorithm initializes (the bottom half of) a 10 x 10 table as follows. (In the 
implementation, cells marked 7^ have value True and blank cells have value False.) 

o 12345678 

1 

2 



3 
4 
5 



In the first iteration of the main loop, the algorithm discovers several distinguishable pairs 
of states. For example, the algorithm sets Dist[0, 2] <— True because Dist[5(0, 1), 5(2, 1)] = 
Dist[2, 9] = True. After the iteration ends, the table looks like this: 





0 


1 


2 


3 


4 


5 


6 


7 8 


1 

2 
















3 














4 
















5 










+ 






6 
















7 












t \ 




8 
















9 

















The second iteration of the while loop makes no further changes to the table — We got lucky! — so 
the algorithm terminates. 
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The final table implies that the states of our trimmed DFA fall into exactly three equivalence 
classes: {0, 1, 3, 5}, {2, 4}, and {6, 7, 8, 9}. Replacing each equivalence class with a single state 
gives us the three-state DFA that we already discovered. 




Equivalence classes of states in the trimmed DFA, and the resulting minimal equivalent DFA. 

Exercises 

1. For each of the following languages in {0, 1}*, describe a deterministic finite-state machine 
that accepts that language. There are infinitely many correct answers for each language. 
"Describe" does not necessarily mean "draw". 

(a) Only the string 0110. 

(b) Every string except 0110. 

(c) Strings that contain the substring 0110. 

(d) Strings that do not contain the substring 0110. 

*(e) Strings that contain an even number of occurrences of the substring 0110. (For 
example, this language contains the strings 0110110 and 01011.) 

(f) Strings that contain the subsequence 0110. 

(g) Strings that do not contain the subsequence 0110. 

(h) Strings that contain an even number of Is and an odd number of 0s. 

(i) Strings that represent a number divisible by 7 in binary. 

(j) Strings whose reversals represent a number divisible by 7 in binary. 

(k) Strings in which the substrings 01 and 10 appear the same number of times. 

(1) Strings such that in every prefix, the number of 0s and the number of Is differ by at 
most 1. 

(m) Strings such that in every prefix, the number of 0s and the number of Is differ by at 
most 4. 

(n) Strings that end with 0 10 = 0000000000. 
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(o) Strings in which the number of Is is even, the number of 0s is divisible by 3, the 
overall length is divisible by 5, the binary value is divisible by 7, the binary value of 
the reversal is divisible by 11, and does not contain thirteen Is in a row. [Hint: This 
is more tedious than difficult] 

2. (a) Let L c 0* be an arbitrary unary language. Prove that L* is regular. 

(b) Prove that there is a binary language L C (0 + 1)* such that L* is not regular. 



*** 



3. Describe and analyze algorithms for the following problems. In each case, the input is a 
DFA M over the alphabet E = {0, 1}. 

(a) Does M accept any string whose length is a multiple of 5? 

(b) Does M accept every string that represents a number divisible by 7 in binary? 

(c) Does M accept an infinite number of strings containing an odd number of Os? 

(d) Does M accept a finite number of strings that contain the substring 0110110 and 
whose length is divisible by five? 

(e) Does M accept only strings whose lengths are perfect squares? 

(f) Does M accept any string whose length is composite! 
*(g) Does M accept any string whose length is prime! 



Move these to the graph traversal notes? 



4. Prove that each of the following languages cannot be accepted by a DFA. 

(a) {0" 2 I n > 0} 

(b) {0" 3 I n > 0} 

(c) {o-^") I n > 0}, where /(n) is any fixed polynomial in n with degree at least 2. 

(d) {0" I n is composite} 

(e) {0"10 n I n> 0} 

(f) {0^ I ijfcj} 
{o'V I i < 3;} 

{0 l l J I i and j are relatively prime} 
{o ! V I j — i is a perfect square} 
{w#w I we (0 + 1)*} 
{ww I we(0 + 1)*} 
{w#0 |w| I we (0+ 1)*} 



(g 
(h 

(i 
0 
(k 
(1 
(m 
(n 
(o 



{wO 



|w| 



W ' 



(0+1)*} 



{xy\ w,xe(0+l)* and |x| = \y \ but x + y} 
{0 m l"0 m+n \m,n>0} 
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(p) {O m l"0 mn \m,n>0} 

(q) Strings in which the substrings 00 and 11 appear the same number of times, 

(r) The set of all palindromes in (0 + 1)* whose length is divisible by 7. 

(s) {w e (0 + 1)* [ w is the binary representation of a perfect square} 

*(t) {w G (0 + 1)* I w is the binary representation of a prime number} 

5. For each of the following languages over the alphabet S = {0, 1}, either describe a DFA 
that accepts the language or prove that no such DFA exists. Recall that S + denotes the 
set of all nonempty strings over S. [Hint: Believe it or not, most of these languages can be 
accepted by DFAs.] 



fa "I 


{0"wl n | w e S* and n > o} 




(b) 


{o"l"w 1 we S* and n> o} 




\s) 


{w0"l"x | w,x e S* and n > 


0} 




{o"wl"x | w,x e E* and n > 


0} 


(e) 


{Q n wlxQ n | w, x e S* and n > o} 


(f) 


{wxw 1 w, x e £*} 




(g) 


{wxw | w, x e E + } 




(h) 


{wxw 8 | w, X € S + } 




(i) 


{wwx | w, x e S + } 




CD 


{ww R x | W, X G S + } 




(k) 


{wxwy | w,ijeS + } 




(1) 


{wxh^j | w, x,y e S + } 




(m) 


{xwwy | w, x,y e S + } 




(n) 


{xumAy | w, x,y e S + } 




(0) 


{wxxw | w, x G E + } 




*(p) 


{wxm^x | w, x e S + } 
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Caveat lector! This is the first edition of this lecture note. Some topics are incomplete, and 
there are almost certainly a few serious errors. Please send bug reports and suggestions to 
jeffe@illinois.edu. 



Nothing is better than simplicity .... 

nothing can make up for excess or for the lack of definiteness. 

— Walt Whitman, Preface to Leaves of Crass (1855) 

Freedom of choice 
Is what you got. 
Freedom from choice 
Is what you want. 

— Devo, "Freedom of Choice", Freedom of Choice (1980) 

Nondeterminism means never having to say you are wrong. 

— BSD 4.3 fortune(6) file (c.1985) 

4 Nondeterminism 

4.1 Nondeterministic State Machines 

The following diagram shows something that looks like a finite-state machine over the alphabet 
{0, 1}, but on closer inspection, it is not consistent with our earlier definitions. On one hand, 
there are two transitions out of 5 for each input symbol. On the other hand, states a and b are 
each missing an outgoing transition. 




A nondeterministic finite-state automaton 



Nevertheless, there is a sense in which this machine "accepts" the set of all strings that 
contain either 00 or 11 as a substring. Imagine that when the machine reads a symbol in state 
s, it makes a choice about which transition to follow. If the input string contains the substring 
00, then it is possible for the machine to end in the accepting state c, by choosing to move into 
state a when it reads a 0 immediately before another 0. Similarly, if the input string contains the 
substring 11, it is possible for the machine to end in the accepting state c. On the other hand, 
if the input string does not contain either 00 or 11 — or in other words, if the input alternates 
between 0 and 1 — there are no choices that lead the machine to the accepting state. If the 
machine incorrectly chooses to transition to state a and then reads a 1, or transitions to b and 
then reads 0, it explodes; the only way to avoid an explosion is to stay in state s. 

This object is an example of a nondeterministic finite-state automaton, or NFA, so named 
because its behavior is not uniquely determined by the input string. Formally, every NFA has five 
components: 

© Copyright 2014 Jeff Erickson. 
This work is licensed under a Creative Commons License (http://creativecommons.0rg/licenses/by-nc-sa/4.O/). 
Free distribution is strongly encouraged; commercial distribution is expressly forbidden. 
See http://www.cs.uiuc.edu/-jeffe/teaching/algorithms/ for the most recent revision. 
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• An arbitrary finite set E, called the input alphabet. 

• Another arbitrary finite set Q, whose elements are called states. 

• An arbitrary transition function 5 : Q x £ — > 2 Q . 

• A start state s e Q. 

• A subset ACQof accepting states. 

The only difference from the formal definition of deterministic finite-state automata is the domain 
of the transition function. In a DFA, the transition function always returns a single state; in an 
NFA, the transition function returns a set of states, which could be empty, or all of Q, or anything 
in between. 

Just like DFAs, the behavior of an NFA is governed by an input string w e £*, which the 
machine reads one symbol at a time, from left to right. Unlike DFAs, however, an NFA does not 
maintain a single current state, but rather a set of current states. Whenever the NFA reads a 
symbol a, its set of current states changes from C to [J qeC 5(q, a). After all symbols have been 
read, the NFA accepts w if its current state set contains at least one accepting state and rejects w 
otherwise. In particular, if the set of current states ever becomes empty, it will stay empty forever, 
and the NFA will reject. 

More formally, we define the function 5* : Q x £* — > 2 Q that transitions on strings as follows: 



The NFA (Q, S, 5, s,A) accepts w e S* if and only if <5*(s, w)nA/0. 

We can equivalently define an NFA as a directed graph whose vertices are the states Q, whose 
edges are labeled with symbols from S. We no longer require that every vertex has exactly one 
outgoing edge with each label; it may have several such edges or none. An NFA accepts a string w 
if the graph contains at least one walk from the start state to an accepting state whose label is w. 

4.2 Intuition 

There are at least three useful ways to think about non-determinism. 

Clairvoyance. Whenever an NFA reads symbol a in state q, it chooses the next state from the 
set 5(q, a), always magically choosing a state that leads to the NFA accepting the input string, 
unless no such choice is possible. As the BSD fortune file put it, "Nondeterminism means never 
having to say you're wrong." 1 Of course real machines can't actually look into the future; that's 
why I used the word "magic". 

Parallel threads. An arguably more "realistic" view is that when an NFA reads symbol a in 
state q, it spawns an independent execution thread for each state in <5(q,a). In particular, if 
5(q, a) is empty, the current thread simply dies. The NFA accepts if at least one thread is in an 
accepting state after it reads the last input symbol. 

Equivalently, we can imagine that when an NFA reads symbol a in state q, it branches into 
several parallel universes, one for each state in 5(q, a). If <5(q, a) is empty, the NFA destroys the 

iThis sentence is a riff on a horrible aphorism that was (sadly) popular in the US in the 70s and 80s. Fortunately, 
everyone seems to have forgotten the original saying, except for that one time it was parodied on the Simpsons. 
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universe (including itself) . Similarly, if the NFA finds itself in a non-accepting state when the 
input ends, the NFA destroys the universe. Thus, when the input is gone, only universes in which 
the NFA somehow chose a path to an accept state still exist. One slight disadvantage of this 
metaphor is that if an NFA reads a string that is not in its language, it destroys all universes. 

Proofs/oracles. Finally, we can treat NEAs not as a mechanism for computing something, but 
only as a mechanism for verifying proofs. If we want to prove that a string w contains one of the 
suffixes 00 or 11, it suffices to demonstrate a single walk in our example NFA that starts at s and 
ends at c, and whose edges are labeled with the symbols in w. Equivalently whenever the NFA 
faces a nontrivial choice, the prover can simply tell the NFA which state to move to next. 

This intuition can be formalized as follows. Consider a deterministic finite state machine 
whose input alphabet is the product S x £1 of an input alphabet £ and an oracle alphabet D.. 
Equivalently, we can imagine that this DEA reads simultaneously from two strings of the same 
length: the input string w and the oracle string co. In either formulation, the transition function 
has the form <5:Qx£xf2— > Q. As usual, this DFA accepts the pair (w, co) e (S x r)* if and only 
if <5*(s, w, co) e A. Finally, M nondeterministically accepts the string w € S* if there is an oracle 
string co e D,* with |a>| = |w| such that (w, co) G L(M). 

4.3 £ -Transitions 

It is fairly common for NEAs to include so-called e-transitions, which allow the machine to 
change state without reading an input symbol. An NFA with e-transitions accepts a string w if 

a l a 2 a 3 a l 

and only if there is a sequence of transitions s — > q 1 — > q 2 — * > qi where the final state 

qi is accepting, each a ; is either e or a symbol in S, and a 1 a 2 • • • = w. 

More formally, the transition function in an NFA with e-transitions has a slightly larger domain 
5 : Q x (£ u {e}) — > 2 Q . The e-reach of a state q e Q consists of all states r that satisfy one of the 
following conditions: 

• r = q 

•re 5{q' , e) for some state q' in the e-reach of q. 

In other words, r is in the e-reach of q if there is a (possibly empty) sequence of e-transitions 
leading from q to r. Now we redefine the extended transition function 5* : Q x £* — > 2 Q , which 
transitions on arbitrary strings, as follows: 



As usual, the modified NFA accepts a string w if and only if 5*(s, w) n A 7^ 0. 

Given an NFA M = (E, Q,s,A, 5) with e-transitions, we can easily construct an equivalent 
NFA M' = (T,,Q!,s',A\ 5') without e-transitions as follows: 




if w = e, 



w = ax. 



Q'~Q 



5=5 



A' = {qt=Q \ e-reach(q)nA^ 0} 
5'{q,a)= |J 5(r,a) 



r££-reach(q) 
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Straightforward definition-chasing implies that M and M' accept exactly the same language. 
Thus, whenever we reason about or design NFAs, we are free to either allow or forbid ^-transitions, 
whichever is more convenient for the task at hand. 

4.4 Kleene's Theorem 

We are now finally in a position to prove the following fundamental fact, first observed by Steven 
Kleene: 

Theorem 4.1. A language L can be described by a regular expression if and only ifL is the language 
accepted by a DFA. 

We will prove Kleene's fundamental theorem in four stages: 

• Every DFA can be transformed into an equivalent NFA. 

• Every NFA can be transformed into an equivalent DFA. 

• Every regular expression can be transformed into an NFA. 

• Every NFA can be transformed into an equivalent regular expression. 

The first of these four transformations is completely trivial; a DFA is just a special type of NFA 
where the transition function always returns a single state. Unfortunately, the other three 
transformations require a bit more work. 

4.5 DFA from NFAs: The Subset Construction 

In the parallel-thread model of NFA execution, an NFA does not have a single current state, but 
rather a set of current states. The evolution of this set of states is determined by a modified 
transition function 5':2 Q xS-> 2 Q , defined by setting 5'(P, a) := {J peP <5(p, a) for any set of 
states PCQ and any symbol a e S. When the NFA finishes reading its input string, it accepts if 
and only if the current set of states intersects the set A of accepting states. 

This formulation makes the NFA completely deterministic! We have just shown that any NFA 
M = (S,Q,5,A, 5) is equivalent to a DFA M' = (S,Q' ,s' ,A' , 5') defined as follows: 

Q! := 2 Q 
5' := {s} 

A' :={SQQ \ SnA^0} 

5'(q, a) := (J 5{p, a) for all q'cQ and a e S. 

peg' 

Similarly any NFA with e-transitions is equivalent to a DFA with the transition function 

5\q',a)=\J |J 5(r,a) 

peg' ree-reach(p) 

for all q' c Q and a e S. This conversion from NFA to DFA is often called the subset construction, 
but that name is somewhat misleading; it's not a "construction" so much as a change in perspective. 

One disadvantage of this "construction" is that it usually leads to DFAs with far more states 
than necessary, in part because most of those states are unreachable. These unreachable states 
can be avoided by constructing the DFA incrementally, essentially by performing a breadth-first 
search of the DFA graph, starting at its start state. 

To execute this algorithm by hand, we prepare a table with | £ | + 3 columns, with one row for 
each DFA state we discover. In order, these columns record the following information: 
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• The DFA state (as a subset of NFA states) 

• The e-reach of the corresponding subset of NFA states 

• Whether the DFA state is accepting (that is, whether the s-reach intersects A) 

• The output of the transition function for each symbol in S. 

We start with DFA-state s in the first row and first column. Whenever we discover an unexplored 
state in one of the last |S| columns, we copy it to the left column in a new row. 

For example, given the NFA on the first page of this note, this incremental algorithm produces 
the following table, yielding a five-state DFA. For this example, the second column is redundant, 
because the NFA has no ^-transitions, but we will see another example with ^-transitions in the 
next subsection. To simplify notation, we write each set of states as a simple string, omitting 
braces and commas. 



4 


e-reachfq') 








s 


s 




as 


bs 


as 


as 




acs 


bs 


bs 


bs 




as 


bcs 


acs 


acs 


✓ 


acs 


bcs 


bcs 


bcs 




acs 


bcs 




Our example NFA, and the output of the incremental subset construction for that NFA. 



4.6 NFAs from Regular Expressions: Thompson's Algorithm 

Lemma 4.2. Every regular language is accepted by a non-deterministic finite automaton. 

Proof: In fact, we will prove the following stronger claim: Every regular language is accepted 
by an NFA with exactly one accepting state, which is different from its start state. The following 
construction was first described by Ken Thompson in 1968. Thompson's algorithm actually proves 
a stronger statement: For any regular language L, there is an NFA that accepts L that has exactly 
one accepting state t, which is distinct from the starting state s. 

Let R be an arbitrary regular expression over an arbitrary finite alphabet S. Assume that for 
any sub-expression S of R, the language described by S is accepted by an NFA with one accepting 
state distinct from its start state, which we denote pictorially by *c(J[J§). There are six cases 
to consider — three base cases and three recursive cases — mirroring the recursive definition of a 
regular expression. 

• If R = 0, then L(R) = 0 is accepted by the empty NFA: O @. 
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• lfR=e, then L(R) = {s} is accepted by the NFA O-^®. 

• If R = a for some symbol a e E, then L(R) = {a} is accepted by the NFA O- 2 *®. (The 
case where R is a single string with length greater than 1 reduces to the single-symbol case 
by concatenation, as described in the next case.) 

• Suppose R = ST for some regular expressions S and T. The inductive hypothesis implies 
that the languages L(S) and L(T) are accepted by NFAs Ci^s~)o) and -»Q^f\@, respectively. 

Then L(R) = L(ST) = L(S) • L(T) is accepted by the NFA c(^)>* c QDs ) ' built b y 
connecting the two component NFAs in series. 

• Suppose R = S + T for some regular expressions S and T. The inductive hypothesis 
implies that the language L(S) and L(T) are accepted by NFAs <(~s~^§) and *efT^o), 

respectively. Then L(R) = L(S + T) = L(S) U L(T) is accepted by the NFA <T — f^g), 
built by connecting the two component NFAs in parallel with new start and accept states. 

• Finally, suppose R = S* for some regular expression S. The inductive hypothesis implies that 
the language L(S) is accepted by an NFA *eQT®. Then the language L(R) = L(S*) = L(S)* 



is accepted by the NFA 



£ 



In all cases, the language L(X) is accepted by an NFA with one accepting state, which is different 
from its start state, as claimed. □ 

As an example, given the regular expression (0 + 10*1)* of strings containing an even number 
of Is, Thompson's algorithm produces a 14-state NFA shown on the next page. As this example 
shows, Thompson's algorithm tends to produce NFAs with many redundant states. Fortunately, 
just as there are for DFAs, there are algorithms that can reduce any NFA to an equivalent NFA 
with the smallest possible number of states. 





The NFA constructed by Thompson's algorithm for the regular expression (0 + 10*1)*. 
The four non-e-transitions are drawn with with bold red arrows for emphasis. 

Interestingly, applying the incremental subset algorithm to Thompson's NFA tends to yield a 
DFA with relatively few states, in part because the states in Thompson's NFA tend to have large 
e-reach, and in part because relatively few of those states are the targets of non-e-transitions. 
Starting with the NFA shown above, for example, the incremental subset construction yields a 
DFA for the language (0 + 10*1)* with just five states: 
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q' 


e-reachCq') 


q'€iA'? 


5'(q',Q) 


5\q',l) 


s 


sabjm 


✓ 


k 


c 


k 


sabjklm 




k 


c 


c 


cdegh 




f 


i 


f 


degh 




f 


i 


i 


sabjilm 




k 


c 



1 




0 

The DFA computed by the incremental subset algorithm from Thompson's NFA for (0 + 10*1)*. 

This DFA can be further simplified to just two states, by observing that all three accepting 
states are equivalent, and that both non-accepting states are equivalent. But still, five states is 
pretty good, especially compared with the 2 13 = 8096 states that the naive subset construction 
would yield! 

*4.7 NFAs from Regular Expressions: Glushkov's Algorithm 

Thompson's algorithm is actually a modification of an earlier algorithm, which was independently 
discovered by Robert McNaughton and Hisao Yamada in i960 and by V I. Glushkov in 1961. Given 
a regular expression containing n symbols (not counting the parentheses and pluses and stars), 
Glushkov's algorithm produces an NFA with exactly n + 1 states. 

Glushkov's algorithm combines six functions on regular expressions: 

• index(R) is the regular expression obtained by replacing the symbols in R with the integers 
1 through n, in order from left to right. For example, index({Q + 10*1)*) = (1 + 23*4)*. 

• symbols(R) denotes the string obtained by removing all non-symbols from R. For example, 
symbols((0 + 10*1)*) = 0101. 

• has-e(R) is True if e e L(R) and False otherwise. 

• first(R) is the set of all initial symbols of strings in L(R). 

• last(R) is the set of all final symbols of strings in L(R). 

• middle(R) is the set of all pairs (a, b) such that ab is a substring of some string in L(R). 
The last four functions obey the following recurrences: 
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has-s{0) = 0 

(True if w = e 
False otherwise 
has-s(S + T) = has-s(S) V has-e(T) 
has-e(ST) = fras-e(S) A has-e(T) 
has-e(S*) = True 



ftrst(0) 


= 0 




last{0) = 


0 


firstfw) 


J 0 if w — e 
\{a} if w = ax 




last{w) = 


J 0 if w = e 
\{a} if w = xa 


first{S + T) 


= first(S) Ufirst(T) 




last(S + T) = 


last(S)Ulast{T) 


first(ST) 


(first(S)Ufirst(T) 
~ [firsKT) 


if has-e(S) 
otherwise 


last(ST) = 


jlast(S) U last(T) 
\last(T) 


firsts*) 


=first(S) 




last(S*) = 


last(S) 



otherwise 



middle{0) = 0 

r „ f0 if k| < 1 

middle(w) = < 

I {(a, b)} UmiddZe(bx) if w = abx 

middle(S + T) = middle(S) U middle(T) 

middk(ST) = middle(S) U (last(S) xfirst(T)) U middle{T) 
middle(S*) = middle(S) U (Za5t(S) xfirst(S)) 

For example, the set middZe((l + 23*4)*) can be computed recursively as follows. If we're doing 
this by hand, we can skip many of the steps in this derivation, because we know what the 
functions first, middle, last, and has-e actually mean, but a mechanical recursive evaluation would 
necessarily evaluate every step. 

rmddZe((l + 23*4)*) 

= middle(l + 23*4) U ( last(l + 23*4) x first(l + 23*4) ) 

= middZe(l) U middZe(23*4) U ( last[l + 23*4) x ftrst(l + 23*4) ) 

= 0 U rmddZe(23*4) u ( last(l + 23*4) x first(l + 23*4) ) 

= middle{2) U ( last(2) x first(3*4) ) U middZe(3*4) U ( last(l + 23*4) x first(l + 23*4) ) 

= 0U({2} x first{3*4)) U mtddZe(3*4) U ( last(l + 23*4) x first(l + 23*4) ) 

= ({2} x (./trst(3*) U first(4) )) U middZe(3*4) U ( last(l + 23*4) x first(l + 23*4) ) 

= ({2} x (first(3) Ufirst(4))) U mtdcfie(3*4) U ( last(l + 23*4) x first(l + 23*4) ) 

= ({2} x {3,4}) U middZe(3*4) U ( last(l + 23*4) x first(l + 23*4) ) 

= {(2, 3), (3, 4)} U middle{3*4) U ( last(l + 23*4) x first{\ + 23*4) ) 

= {(1, 1), (1, 2), (2, 3), (2, 4), (3, 3), (3, 4), (4, 1), (4, 2)} 
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Finally, given any regular expression R, Glushkov's algorithm constructs the NFA M R = 
(E, Q,s,A, <5) that accepts exactly the language L(R) as follows: 

Q = {0,l,...,\symbok(R)\} 

5 = 0 



A ■ 



f {0}Ulast(index(R)) iihas-s^R') 



I last(index(R)) otherwise 
5(0, a) = {j (Efirst(index(R)) \ a = symbob(R)[j]} 
5(i, a) = {j I € middle(index(R)) and a = symbols(R)[j]} 

There are a few natural ways to think about Glushkov's algorithm that are somewhat less 
impenetrable than the previous wall of definitions. One viewpoint is that Glushkov's algorithm 
first computes a DBA for the indexed regular expression index(R) — in fact, a DFA with the 
fewest possible states, except for an extra start state — and then replaces each index with the 
corresponding symbol in symbols(R) to get an NFA for the original expression R. Another useful 
observation is that Glushkov's NFA is identical to the result of removing all e-transitions from 
Thompson's NFA for the same regular expression. 

For example, given the regular expression R = (0 + 10*1)*, Glushkov's algorithm computes 

index(R) = (1 + 23*4)* 
symbols(R) = 0101 
has-e(R) = True 
first(index(R)~) = {1,2} 
last(index(R)) = {1,4} 
middle{index{R)) = {(1, 1), (1, 2), (2, 3), (2, 4), (3, 3), (3, 4), (4, 1), (4, 2)} 

and then constructs the following five-state NFA. 





1 0 
Glushkov's DFA for the index expression (1 + 23*4)* and Glushkov's NFA for the regular expression (0 + 10*1)*. 

Hey look, Glushkov's algorithm actually gave us a DFA! In fact, it gave us precisely the same 
DFA that we constructed earlier by sending Thompson's NFA through the incremental subset 
algorithm! Unfortunately, that's just a coincidence; in general the output of Glushkov's algorithm 
is not deterministic. We'll see a more typical example in the next section. 
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4.8 Another Example 

Here is another example of all the algorithms we've seen so far, starting with the regular 
expression (0 + 1)*(00 + 11)(0 + 1)*, which describes the language accepted by our very first 
example NFA. Thompson's algorithm constructs the following 26-state monster: 




Thompson's NFA for the regular expression (0 + 1)*(0O + 11)(0 + 1)* 



Given this NFA as input, the incremental subset construction computes the following table, 
leading to a DFA with just nine states. Yeah, the e-reaches get a bit ridiculous; unfortunately, this 
15 typical for Thompson's NFA. 



q' 


£-reach(q') 


q'&A'? 




5'iq', 1) 


s 


sabdghim 




cj 


en 


cj 


sabdf ghijkm 




cjl 


en 


en 


sabdf ghmno 




cj 


enp 


cjl 


sabdf ghijklmqrtuwz 


✓ 


cjlv 


enx 


enp 


sabdf ghmnopqrtuwz 




cjv 


enpx 


cjlv 


sabdf ghijklmqr tuvwyz 


✓ 


cjlv 


enx 


enx 


sabdf ghmnopq r tuwxyz 


✓ 


cjv 


enpx 


cjv 


sabdf ghijkmr tuvwyz 




cjlv 


enx 


enpx 


sabdf ghmnopqr tuwxyz 


✓ 


cjv 


enpx 



0 0 




1 0 



The DFA computed by the incremental subset algorithm from Thompson's NFA for (0 + 1)*(00 + 11)(0 + 1)*. 

This DFA has far more states that necessary, intuitively because it keeps looking for 00 and 
11 substrings even after it's already found one. After all, when Thompson's NFA finds aOO and 
11 substring, it doesn't kill all the other parallel threads, because it can't. NEAs often have 
significantly fewer states than equivalent DFAs, but that efficiency also makes them kind of 
stupid. 

Glushkov's algorithm recursively computes the following values for the same regular expression 
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i? = (0+l)*(00 +11)(0 + 1)*: 

index(R) = (1 + 2)*(34 + 56)(7 + 8)* 
symbols(R) = 01001101 
has-e(R) = False 
first(index(R)) = {1, 2, 3, 5} 
lastiindexW) = {4, 6, 7,8} 
middle{index{R)) = {(1, 1), (1, 2), (2, 1), (2, 2), (1, 3), (1, 5), (2, 3), (2, 5), (3, 4), 



These values imply the nine-state NFA shown below. Careful readers should confirm that running 
the incremental subset construction on this NFA yields exactly the same DFA (with different state 
names) as it did for Thompson's NFA. 



*4.9 Regular Expressions from NFAs: Han and Wood's Algorithm 

The only component of Kleene's theorem we still have to prove is that every language accepted 
by a DFA or NFA is regular. As usual, it is actually easier to prove a stronger result. We consider a 
more general class of finite-state machines called expression automata, introduced by Yo-Sub 
Han and Derick Wood in 2005. 2 Formally, an expression automaton consists of the following 
components: 

• A finite set £ called the input alphabet 

• Another finite set Q whose elements are called states 

• A start state s e Q 

• A single terminal state teQ\{s} 

• A transition function R: (Q \ { t}) x (Q \ {5}) — » Reg(S), where Reg(E) is the set of regular 
expressions over £ 

Less formally, an expression automaton is a directed graph that includes a directed edge p^q 
labeled with a regular expression R(p->q), from every vertex p to every vertex q (including q= p), 
where by convention, we require that R(q^s) = R(t-Kj) = 0 for every vertex q. 

2 Yo-Sub Han* and Derick Wood. The generalization of generalized automata: Expression automata. International 
Journal of Foundations of Computer Science 16(33:499-510, 2005. 



(5, 6), (4, 7), (4, 8), (6, 7), (6, 8), (7, 7), (7, 8), (8, 7), (8, 8)} 




Glushkov's NFA for (0 + 1)*(00 + 11)(0 + 1)* 
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We say that string w matches a transitionp^q if w matches the regular expression R(p^q). In 
particular, if R(p^q) = 0, then no string matches p^q. More generally, w matches a sequence of 

states qo^Qi^ >q k if w matches the regular expression i?(q 0 ^q!) •i?(q 1 ^q 2 ) • • • • *R(qk-i^qk)- 

Equivalently w matches the sequence q 0 ^qi^ if either 

• w = e and the sequence has only one state (k = 0), or 

• w — xy for some string x that matches the regular expression R(qo^Qi) and some string 
y that matches the remaining sequence >q k . 

An expression automaton accepts any string that matches at least one sequence of states that 
starts at s and ends at t. The language of an expression automaton E is the set of all strings that 
E accepts. 

Expression automata are nondeterministic. A single string could match several (even infinitely 
many) state sequences that start with 5, and it could match each of those state sequences in 
several different ways. A string is accepted if at least one of the state sequences it matches ends 
at t. Conversely, a string might match no state sequences; all such strings are rejected. 

Two special cases of expression automata are already familiar. First, every regular language 
is clearly the language of an expression automaton with exactly two states. Second, with only 
minor modifications, any DFA or NFA can be converted into an expression automaton with 
trivial transition expressions. Thompson's algorithm can be used to transform any expression 
automaton into an NFA, by recursively expanding any nontrivial transition. To complete the 
proof of Kleene's theorem, we show how to convert any expression automaton into a regular 
expression by repeatedly deleting vertices. 

Lemma 4.3. Every expression automaton accepts a regular language. 

Proof: Let E = (Q, T,,R,s, t) be an arbitrary expression automaton. Assume that any expression 
automaton with fewer states than E accepts a regular language. There are two cases to consider, 
depending on the number of states in Q: 

• If Q = {5, t}, then trivially, E accepts the regular language R(s->t)- 

• On the other hand, suppose there is a state q e Q \ {5, a}. We can modify the expression 
automaton so that state q is redundant and can be removed. Define a new transition 
function J?':QxQ-> Reg(S) by setting 

R'ip^r) := R(p->r) + R(p^q)R(q^q)* R(q->r). 

With this modified transition function in place, any string w that matches the sequence 

p^q^q^ >q^r with any number of q's also matches the single transition p^r. Thus, 

by induction, if w matches a sequence of states, it also matches the subsequence obtained 
by removing all q's. Let E' be the expression automaton with states Q' = Q \ {q} that uses 
this modified transition function R'. This new automaton accepts exactly the same strings 
as the original automaton E. Because E' has fewer states than E, the inductive hypothesis 
implies E' accepts a regular language. 

In both cases, we conclude that E accepts a regular language. □ 

This proof can be mechanically translated into an algorithm to convert any NFA — in particular, 
any DFA — into an equivalent regular expression. Given an NFA with n states (including s and 
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a), the algorithm iteratively removes n — 2 states, updating 0(n 2 ) transition expressions in each 
iteration. If the concatenation and Kleene star operations could be performed in constant time, 
the resulting algorithm would run in 0(n 3 ) time. However, in each iteration, the transition 
expressions grows in length by roughly a factor of 4 in the worst case, so the final expression 
has length 6(4"). If we insist on representing the expressions as explicit strings, the worst-case 
running time is actually 6(4"). 

A figure on the next page shows this conversion algorithm in action for a simple DFA. First 
we convert the DFA into an expression automaton by adding new start and accept states and 
merging two transitions, and then we remove each of the three original states, updating the 
transition expressions between any remaining states at each iteration. For the sake of clarity, 
edges p^q with R(p->q) = 0 are omitted from the figures. 




0 OA 



v — ' m 





1((0+1)(1+00)*01)*(0+1X1+0O)* 
Converting a DFA into an equivalent regular expression. 



Exercises 

1. For each of the following NFAs, describe an equivalent DFA. ("Describe" does not necessarily 
mean "draw"!) 



*** 



Haifa dozen examples. 



2. For each of the following regular expressions, draw an equivalent NFA. 



*** 



Haifa dozen examples. 
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3. For each of the following regular expressions, describe an equivalent DFA. ("Describe" does 
not necessarily mean "draw"!) 



*** 



Haifa dozen examples. 



4. Let L c S* be an arbitrary regular language. Prove that the following languages are 
regular. 

(a) ones(L) := {w e 1* | \w\ = |x| for some x e l} 

(b) reverse(L) := {w e S* | IV s e l}. (Recall that w R denotes the reversal of string w.) 

(c) prefix(L) := {x e E* | xy e L for some y e E*} 

(d) suffix(L) := {y e E* | xyeL for some x e E*} 

(e) substring(L) := {y € E* | xyz e L for some x,z e E*} 

(f) cyde(L) := {xy | x,y e E* and yx e L} 

(g) prefmax(L) := {x e L \ xy e L <=> y = e}. 

(h) sufmin(L) := {xy e L | y e L <=> x = e}. 

(i) everyother(L) := {everyother(w) \ w e L}, where evetyoiher(w) is the subsequence of 
w containing every other symbol. For example, everyotfrer(EVERYOTHER) = VROHR. 

(j) rehtoyreve(L) := {w e E* | everyother(w) e L}. 

(k) subseq(L) := {x e E* | x is a subsequence of some y e L} 

(1) superseq(L) := {x e E* | some y e L is a subsequence of x} 
(m) left(L) := {x e E* | xy e L for some y e E* where |x| = |y|} 

(n) right(L) := {y e E* | xy e L for some x € E* where |x| = |y|} 

(o) middle(L) := {y e E* | xyz S L for some x,z e E* where |x| = |y| = |z|} 

(p) half(L) := {w e E* | ww e L} 

(q) third(L) := {w e E* | www e L} 

(r) reflect(L) := {w e E* | wi/ e l} 

*(s) 5qrt(L) := {x e E* | xy e L for some y e E* such that |y| = |x| 2 } 

*(t) log(L) := {x e E* | xy e L for some y e E* such that |y| = 2 |x| } 

*(u) flog(L) := {x e E* | xy e L for some y e E* such that |y| = F| x |}, where F n is the 
nth Fibonacci number. 



*5. Let L C E* be an arbitrary regular language. Prove that the following languages are regular. 
[Hint: For each language, there is an accepting NFA with at most q q states, where q is the 
number of states in some DFA that accepts L.] 
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(a) repeat(L) := {w e S* | w" e L for some n > 0} 

(b) allreps(L) := {w e E* | w" e L for every n > 0} 

(c) manyreps(L) := {weE*| w"eL for infinitely many n > 0} 

(d) fewreps(L) := {w e E* | w"el for finitely many n > 0} 

(e) powers(L) := {w e S* | w 2 " e L for some n > 0} 

' (f) whatthe N {L) := {w e S* | w™ e L for some n e AT}, where N is an arbitrary fixed set 
of non-negative integers. [Hint: You only have to prove that an accepting NEA exists; 
you don't have to describe how to construct it] 



6. For each of the following expression automata, describe an equivalent DFA and an equivalent 
regular expression. 



*** 



Haifa dozen examples. 
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Caveat lector: This is the first edition of this lecture note. Please send bug reports and 
suggestions to jeffe@illinois.edu. 



Imagine a piano keyboard, eh, 88 keys, only 88 and yet, and yet, hundreds of new melodies, 
new tunes, new harmonies are being composed upon hundreds of different keyboards every 
day in Dorset alone. Our language, tiger, our language: hundreds of thousands of available 
words, frillions of legitimate new ideas, so that I can say the following sentence and be 
utterly sure that nobody has ever said it before in the history of human communication: 
"Hold the newsreader's nose squarely, waiter, or friendly milk will countermand 
my trousers. " Perfectly ordinary words, but never before put in that precise order. A unique 
child delivered of a unique mother. 

— Stephen Fry, A Bit of Fry and Laurie, Series 1, Episode 3 (1989) 



5 Context-Free Languages and Grammars 
5.1 Definitions 

Intuitively, a language is regular if it can be built from individual strings by concatenation, union, 
and repetition. In this note, we consider a wider class of context-free languages, which are 
languages that can be built from individual strings by concatenation, union, and recursion. 

Formally, a language is context-free if and only if it has a certain type of recursive description 
known as a context-free grammar, which is a structure with the following components: 

• A finite set S, whose elements are called symbols or terminals. 

• A finite set T disjoint from S, whose elements are called non-terminals. 

• A finite set R of production rules of the form A — > w, where A e T is a non-terminal and 
we(SU T)* is a string of symbols and variables. 

• A starting non-terminal, typically denoted S. 

For example, the following eight production rules describe a context free grammar with terminals 
S = {0, 1} and non-terminals T = {S,A,B}: 

S->A A->0A C^e 

S^B A->0C B->C1 C->0C1 

Normally we write grammars more compactly by combining the right sides of all rules for 
each non-terminal into one list, with alternatives separated by vertical bars. 1 For example, the 
previous grammar can be written more compactly as follows: 

S — > A I B 
A — » OA I OC 
B ->B1 1 CI 
C e I 0C1 

For the rest of this lecture, I will almost always use the following notational conventions. 

^es, this means we now have three symbols U, +, and | with exactly the same meaning. Sigh. 

© Copyright 2014 Jeff Erickson. 
This work is licensed under a Creative Commons License (http://creativecommons.0rg/licenses/by-nc-sa/4.O/). 
Free distribution is strongly encouraged; commercial distribution is expressly forbidden. 
See http://www.cs.uiuc.edu/-jeffe/teaching/algorithms/ for the most recent revision. 
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• Monospaced digits (0, 1, 2, ... ), and symbols (❖,$,#, •,...) are explicit terminals. 

• Early lower-case Latin letters (a, b,c,.. .) represent unknown or arbitrary terminals in S. 

• Upper-case Latin letters (A,B, C, . . .) and the letter S represent non-terminals in T. 

• Late lower-case Latin letters (. . . , w, x, y, z) represent strings in (S U T)*, whose characters 
could be either terminals or non-terminals. 

We can apply a production rule to a string in (S U T)* by replacing any instance of the 
non-terminal on the left of the rule with the string on the right. More formally for any strings 
i,y,ze(EU T)* and any non-terminal A e T, applying the production rule A — > y to the string 
xAz yields the string xyz. We use the notation xAz -~* xyz to describe this application. For 
example, we can apply the rule C — > 0C1 to the string QOCLBACO in two different ways: 

QQC1BACQ ~» QQ BC1 LSAC8 00C IB ACQ ~» 00C LB A0C1 8 

More generally, for any strings x,z € (S U T)*, we say that z derives from x, written x ~»* z, 
if we can transform x into z by applying a finite sequence of production rules, or more formally, 
if either 

• x = z, or 

• x -~> y and y -~>* z for some string y e (S U T)*. 

Straightforward definition-chasing implies that, for any strings w, x,y,z e (cr U y)*, if x ~»* y, 
then wxz -~>* wyz. 

The language L{w ) of any string w e (S U T)* is the set of all strings in S* that derive from w: 

L(w) := {x e E* | w ~»* x} . 

The language generated by a context-free grammar G, denoted 1(G), is the language of its 
starting non-terminal. Finally, a language is context-free if it is generated by some context-free 
grammar. 

Context-free grammars are sometimes used to model natural languages. In this context, the 
symbols are words, and the strings in the languages are sentences. For example, the following 
grammar describes a simple subset of English sentences. (Here I diverge from the usual notation 
conventions. Strings in (angle brackets) are non-terminals, and regular strings are terminals.) 

(sentence) — > (noun phrase) (verb phrase) (noun phrase) 
(noun phrase) — > (adjective phrase) (noun) 
(adj. phrase) — > (article) | (possessive) | (adjective phrase) (adjective) 
(verb phrase) — » (verb) | (adverb) (verb phrase) 

(noun) — > dog | trousers | daughter | nose | homework | time lord | pony | • ■ ■ 
(article) — > the | a | some | every | that | • • • 
(possessive) — > (noun phrase)'s | my | your | his | her | • • • 
(adjective) — > friendly | furious | moist | green | severed | timey-wimey | little | ■ • • 
(verb) — > ate | found | wrote | killed | mangled | saved | invented | broke | • • • 
(adverb) — > squarely | incompetently | barely | sort of | awkwardly | totally | ■ ■ • 



2 



Algorithms 



Lecture 5: Context-Free Languages and Grammars [Fa'14] 



5.2 Parse Trees 

It is often useful to visualize derivations of strings in L(G) using a parse tree. The parse tree for 
a string w e L(G) is a rooted ordered tree where 

• Each leaf is labeled with a terminal or the empty string e. Concatenating these in order 
from left to right yields the string w. 

• Each internal node is labeled with a non-terminal. In particular, the root is labeled with 
the start non-terminal S. 

• For each internal node v, there is a production rule A — * co where A is the label of v and 
the symbols in co are the labels of the children of v in order from left to right. 

In other words, the production rules of the grammar describe template trees that can be 
assembled into larger parse trees. For example, the simple grammar on the previous page has 
the following templates, one for each production rule: 

S S A A B B C C 

I I /\ /\ /\ /\ I 

A B 0 A 0 C B 1 CI e 0C1 

The same grammar gives us the following parse tree for the string 000011: 

S 
I 

A 

0 A 
0 C 
0 C 1 

0 C 1 

I 

e 

Our more complicated "English" grammar gives us parse trees like the following: 

(sentence) 

(noun phrase) (verb phrase) 

(adj. phrase) (noun) (adverb) (verb phrase) 

(adj. phrase) (adjective) time lord barely (verb) 

adj. phrase) (adjective) green mangled 

I I 
(posessive) furious 

I 

your 



(noun phrase) 

(adj. phrase) (noun) 

I I 
(posessive) trousers 

(noun phrase) 's 

(adj. phrase) (noun) 

I I 
(possessive) dog 

I 

my 

Any parse tree that contains at least one node with more than one non-terminal child 
corresponds to several different derivations. For example, when deriving an "English" sentence, 
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we have a choice of whether to expand the first (noun phrase) ("your furious green time lord") 
before or after the second ("my dog's trousers"). 

A string w is ambiguous with respect to a grammar if there is more than one parse tree for 
w, and a grammar G is ambiguous is some string is ambiguous with respect to G. Neither of the 
previous example grammars is ambiguous. However, the grammar S — » 1 | S+S is ambiguous, 
because the string 1+1+1+1 has five different parse trees: 





S+S S+S 





A context-free language L is inherently ambiguous if every context-free grammar that 
generates L is ambiguous. The language generated by the previous grammar (the regular 
language (1+)*1) is not inherently ambiguous, because the unambiguous grammar S — > 1 | 1+S 
generates the same language. 



5.3 From Grammar to Language 

Let's figure out the language generated by our first example grammar 

S->A|B A->0A|0C B->B1|C1 C -> e | 0C1. 

Since the production rules for non-terminal C do not refer to any other non-terminal, let's begin 
by figuring out L(C). After playing around with the smaller grammar C — > e | 0C1 for a few 
seconds, you can probably guess that its language is {e, 01, 0011, 000111, . . .}, that is, the set all 
of strings of the form 0"1" for some integer n. For example, we can derive the string 00001111 
from the start non-terminal S using the following derivation: 

C ~* 0C1 ~» 00C11 ~» 000C111 ~» O0O0C1111 ~» 0000S1111 = 00001111 



The same derivation can be viewed as the following parse tree: 

C 




In fact, it is not hard to prove by induction that L(C) = {0"l n | n > 0} as follows. As usual when 
we prove that two sets X and Y are equal, the proof has two stages: one stage to prove X C.Y, 
the other to prove Y C.X. 
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• First we prove that C ~»* 0"1" for every non-negative integer n. 

Fix an arbitrary non-negative integer n. Assume that C -~»* Q k l k for every non-negative 
integer k <n. There are two cases to consider. 

- If n = 0, then 0 n l" = e. The rule C — > e implies that C -~» e and therefore C ~»* e. 

- Suppose n > 0. The inductive hypothesis implies that C 0 n_1 l" _1 . Thus, the rule 
C -» 0C1 implies that C ~» GC1 ~»* OCO" -1 !" -1 )! = 0 n l n . 

In both cases, we conclude that that C ~>* 0"1", as claimed. 

• Next we prove that for every string w€S* such that C -~»* w, we have w = 0"l n for some 
non-negative integer n. 

Fix an arbitrary string w such that C ~>* w. Assume that for any string x such that |x| < |w| 
and C ~>* x, we have x = 0 fc l fc for some non-negative integer k. There are two cases to 
consider, one for each production rule. 

- If w = e, then w = 0°1°. 

- Suppose w = 0x1 for some string x such that C ~»* x. Because |x| = |w| — 2 < \w\, 
the inductive hypothesis implies that x = 0 1 for some integer k. Then we have 

w=0 fc+l 1 /c+l_ 

In both cases, we conclude that that w = 0"1" for some non-negative integer n, as claimed. 

The first proof uses induction on strings, following the boilerplate proposed in the previous 
lecture; in particular, the case analysis mirrors the recursive definition of "string". The second 
proof uses structural induction on the grammar; the case analysis mirrors the recursive definition 
of the language of S, as described by the production rules. In both proofs, the inductive hypothesis 
is "Assume there is no smaller counterexample." 

Similar analysis implies that L(A) = {0 m l n | m > n} and L(B) = {0 m l n | m < n}, and 
therefore L(S) = {0 ffl l"|m^ n}. 



5.4 More Examples 



*** 



Give three or four examples of simple but interesting context-free grammars. Some 
possibilities: 

• Same number of 0s and Is 

• Different number of 0s and Is 

• Palindromes 

• Balanced parentheses 

• Arithmetic/algebraic expressions 

• Regular expressions 



5.5 Regular Languages are Context-Free 

The following inductive argument proves that every regular language is also a context-free 
language. Let L be an arbitrary regular language, encoded by some regular expression R. Assume 
that any regular expression shorter than R represents a context-free language. ("Assume no 
smaller counterexample.") We construct a context-free grammar for L as follows. There are 
several cases to consider. 
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• Suppose L is empty. Then L is generated by the trivial grammar S — > S. 

• Suppose L = {w} for some string w e X*. Then L is generated by the grammar S — > w. 

• Suppose L is the union of some regular languages L 1 and L 2 . The inductive hypothesis 
implies that L x and L 2 are context-free. Let G± be a context-free language for L x with 
starting non- terminal S 1 , and let G 2 be a context-free language for L 2 with starting non- 
terminal S 2 , where the non-terminal sets in G x and G 2 are disjoint. Then L = L x U L 2 is 
generated by the production rule S — > Sj^ | S 2 . 

• Suppose L is the concatenation of some regular languages L 1 and L 2 . The inductive 
hypothesis implies that L 1 and L 2 are context-free. Let G x be a context-free language for 
I x with starting non-terminal S 1; and let G 2 be a context-free language for L 2 with starting 
non-terminal S 2 , where the non-terminal sets in G x and G 2 are disjoint. Then L = L 1 L 2 is 
generated by the production rule S — > S^. 

• Suppose L is the Kleene closure of some regular language L 1 . The inductive hypothesis 
implies that L 1 is context-free. Let G x be a context-free language for L 1 with starting 
non-terminal S 1 . Then L = is generated by the production rule S — > e | SjS. 

In every case, we have found a context-free grammar that generates L, which means L is 
context-free. 

In the next lecture note, we will prove that the context-free language {0 n l n [ n > 0} is not 
regular. (In fact, this is the canonical example of a non-regular language.) Thus, context-free 
grammars are strictly more expressive than regular expressions. 

5.6 Not Every Language is Context-Free 

Again, you may be tempted to conjecture that every language is context-free, but a variant of our 
earlier cardinality argument implies that this is not the case. 

Any context-free grammar over the alphabet S can be encoded as a string over the alphabet 
S U T U {£, — >, I , $}, where $ indicates the end of the production rules for each non-terminal. For 
example, our example grammar 



can be encoded as the string 

S^A I B$A->0A I 0C$B->B1 1 Cl$C->£ | 0C1$ 

We can further encode any such string as a binary string by associating each symbol in the 
set S U T U {£, —>,|,$} with a different binary substring. Specifically, if we encode each of the 
grammar symbols £, — », | , $ as a string of the form 11*0, each terminal in S as a string of the 
form 011*0, and each non-terminal as a string of the form 0011*0, we can unambiguously recover 
the grammar from the encoding. For example, applying the code 



S^A\B 



A-> QA \ QC 



B ->B1 I CI 



C^e\ 0C1 



£-> 10 



0 -> 010 



S -> 0010 



-> 110 



1>-»O110 



A -> 00110 



I -> 1110 
$-> 11110 



B-> 001110 



C -> 0011110 



transforms our example grammar into the 135-bit string 
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00101100011011100011101111000110 

11001000110111001000111101111000 
11101100011100110111000111100110 
11110001111011010111001000111100 
1011110. 

Adding a 1 to the start of this bit string gives us the binary encoding of the integer 
5 1 1 15 617 766 581 763 757 672 062 401 233 529 937 502. 

Our construction guarantees that two different context-free grammars over the same language 
(ignoring changing the names of the non-terminals) yield different positive integers. Thus, the 
set of context-free grammars over any alphabet is at most as large as the set of integers, and is 
therefore countably infinite. (Most integers are not encodings of context-free grammars, but 
that only helps us.) It follows that the set of all context-free languages over any fixed alphabet is 
also countably infinite. But we already showed that the set of all languages over any alphabet is 
uncountably infinite. So almost all languages are non-context-free! 

Although we will probably not see them in this course, there are techniques for proving that 
certain languages are not context-free, just as there are for proving certain languages are not 
regular. In particular, the {0"l n 0" [ n > 0} is not context-free. (In fact, this is the canonical 
example of a non-context-free language.) 

* 5 . 7 Recursive Automata 

All the flavors of finite-state automata we have seen so far describe/ encode/accept/compute 
regular languages; these are precisely the languages that can be constructed from individual 
strings by union, concatenation, and unbounded repetition. Just as context-free grammars are 
recursive generalizations of regular expressions, we can define a class of machines called recursive 
automata, which generalize (nondeterministic) finite-state automata. 

Formally, a recursive automaton consists of the following components: 

• A non-empty finite set S, called the input alphabet 

• Another non-empty finite set N, disjoint from S, whose elements are called module names 

• A start name S e N 

• A set M = {M A I A e N } of NFAs over the alphabet S U N called modules, each with a single 
accepting state. Each module M A has the following components: 

- A finite set Q A of states, such that Q A n Q B = 0 for all A ^ B 

- A start state s A e Q A 

- A terminal or accepting state t A e Q A 

- A transition function 5 A : Q A x (S U {e} U IV) — > 2 Qa . 

Equivalently we have a single global transition function 5 : Q x (S u {e} U N) — > 2 Q , where 
Q = [_) AeN Q A , such that for any name A and any state q e Q A we have <5(q) c Q A . Machine M s is 
called the main module. 

A configuration of a recursive automaton is a triple (w, q,s), where w is a string in E* called 
the input, q is a state in Q called the local state, and a is a string in Q* called the stack. The 
module containing the local state q is called the active module. A configuration can be changed 
by three types of transitions. 
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• A read transition consumes the first symbol in the input and changes the local state within 
the current module, just like a standard NFA. 

• An epsilon transition changes the local state within the current module, without consuming 
any input characters, just like a standard NFA. 

• A call transition chooses an arbitrary name A, changes the current state q 0 to some state in 
5(q,A), and pushes the corresponding start state s A onto the stack (thereby changing the 
active module to M A ), without consuming any input characters. 

• Finally, if the current state is the terminal state of some module and the stack is non-empty, 
a return transition pops the top state off the stack and makes it the new local state (thereby 
possibly changing the active module), without consuming any input characters. 

Symbolically, we can describe these transitions as follows: 



read: (ax,q,a)< — >(x,q',o) 

epsilon: {w, q, cr) 1 — > (w, q' , cr) 

call: (w, q, a) < — > (w, s A , q ■ a) 

return: (w, t A , q-cr)< — » (w, q, u) 



for some q' e <5(q, a) 
for some q e <5(q, e) 
for some A e N and q e 5(q, A) 



A recursive automaton accepts a string w if there is a finite sequence of transitions starting at the 
start configuration (w,s s , s) and ending at the terminal configuration (e, t s , s). 

For example, the following recursive automaton accepts the language {0 m l"|m/n}. The 
recursive automaton has two component machines; the start machine named S and a "subroutine" 
named E (for "equal") that accepts the language {0"1" | n > 0}. White arrows indicate recursive 
transitions. 





A recursive automaton for the language {0 m l" |m^n} 



Lemma 5.1. Every context-free language is accepted by a recursive automaton. 
Proof: 



Direct construction from the CFG, with one module per nonterminal. 



For example, the context-free grammar 

A OA I E 
B — > Bl I £ 
E -> e I 0E0 

leads to the following recursive automaton with four modules: 



□ 
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Figure! 



Lemma 5.2. Every recursive automaton accepts a context-free language. 

Proof (sketch): Let R = (E,iV, S, 5,M) be an arbitrary recursive automaton. We define a 
context-free grammar G that describes the language accepted by R as follows. 

The set of nonterminals in G is isomorphic the state set Q; that is, for each state q e Q, the 
grammar contains a corresponding nonterminal [q]. The language of [q] will be the set of strings 
w such that there is a finite sequence of transitions starting at the start configuration (w, q, e) 
and ending at the terminal configuration (e, t, e), where t is the terminal state of the module 
containing q. 

The grammar has four types of production rules, corresponding to the four types of transitions: 

• read: For each symbol a and each pair of states p and q such that p e 5(q, a), the grammar 
contains the production rule [q] — > a[p]. 

• epsilon: For any two states p and q such that p e <5(q, e), the grammar contains the 
production rule [q] — > [p]. 

• call: Each name A and each pair of states states p and q such that p e 5(q, A), the grammar 
contains the production rule [q] — > [s A ][p]. 

• return: Each name A, the grammar contains the production rule [t A ] — > e. 

Finally, the starting nonterminal of G is [s s ], which corresponds to the start state of the main 
module. 

We can now argue inductively that the grammar G and the recursive automaton R describe 
the same language. Specifically, any sequence of transitions in R from (w,s s , e) to (e, t s , e) can be 
transformed mechanically into a derivation of w from the nonterminal [s s ] in G. Symmetrically, 
the leftmost derivation of any string w in G can be mechanically transformed into an accepting 
sequence of transitions in R. We omit the straightforward but tedious details. □ 

For example, the recursive automaton on the previous page gives us the following context-free 
grammar. To make the grammar more readable, I've renamed the nonterminals corresponding to 
start and terminal states: S = [s s ], T = [t s ], and E = [s E ] = [t E ]: 



Our earlier proofs imply that we can forbid e-transitions or even allow regular-expression 
transitions in our recursive automata without changing the set of languages they accept. 

*5.8 Chomsky Normal Form 

For many algorithmic problems involving context-free grammars, it is helpful to consider 
grammars with a particular special structure called Chomsky normal form, abbreviated CNF: 

• The starting non-terminal S does not appear on the right side of any production rule. 



S -> EA I QB 
A-> 1A I IT 
B -> OB I ET 



X^EY 



Y -> 1Z 



E -> e I OX 



Z — > E 
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• The starting non-terminal S may have the production rule S — > e. 

• The right side of every other production rule is either a single terminal symbol or a string 
of exactly two non-terminals — that is, every other production rule has the form A — > BC or 
A^a. 

A particularly attractive feature of CNF grammars is that they yield full binary parse trees; in 
particular, every parse tree for a string of length n > 0 has exactly 2n — 1 non-terminal nodes. 
Consequently, any string of length n in the language of a CNF grammar can be derived in exactly 
2n — 1 production steps. It follows that we can actually determine whether a string belongs to 
the language of a CNF grammar by brute-force consideration of all possible derivations of the 
appropriate length. 

For arbitrary context-free grammars, there is no similar upper bound on the length of a 
derivation, and therefore no similar brute-force membership algorithm, because the grammar 
may contain additional s-productions of the form A — » e and/or unit productions of the form 
A — » B, where both A and B are non-terminals. Unit productions introduce nodes of degree 1 
into any parse tree, and ^-productions introduce leaves that do not contribute to the word being 
parsed. 

Fortunately, it is possible to determine membership in the language of an arbitrary context-free 
grammar, thanks to the following theorem. Two context-free grammars are equivalent if they 
define the same language. 



Every context-free grammar is equivalent to a grammar in Chomsky normal form. 



To be more specific, define the total length of a context-free grammar to be the number of 
symbols needed to write down the grammar; up to constant factors, the total length is the sum 
of the lengths of the production rules. 

Theorem 5.3. For every context-free grammar with total length L, there is an equivalent grammar 
in Chomsky normal form with total length 0(L 2 ), which can be computed in 0(L 2 ) time. 

Converting an arbitrary grammar into Chomsky normal form is a complex task. Fortunately, 
for most applications of context-free grammars, it's enough to know that the algorithm exists. 
For the sake of completeness, however, I will describe one such conversion algorithm here. This 
algorithm consists of several relatively straightforward stages. Efficient implementation of some 
of these stages requires standard graph-traversal algorithms, which we will describe much later 
in the course. 

0. Add a new starting non-terminal. Add a new non-terminal S' and a production rule S' — > S, 
where S is the starting non-terminal for the given grammar. S' will be the starting non-terminal 
for the resulting CNF grammar. (In fact, this step is necessary only when S ~»* e, but at this point 
in the conversion process, we don't yet know whether that's true.) 

1. Decompose long production rules. For each production rule A — > co whose right side w has 
length greater than two, add new production rules of length two that still permit the derivation 
A ~>* co. Specifically, suppose co = a % for some symbol a e S U Y and string % e (S U T)*. The 
algorithm replaces A — » co with two new production rules A — » aB and B — » %, where B is a new 
non- terminal, and then (if necessary) recursively decomposes the production rule B — » % . For 
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example, we would replace the long production rule A — > QBC1CB with the following sequence 
of short production rules, where each A; is a new non-terminal: 

A -> QA 1 A x -> BA 2 A 2 CA 3 A 3 -> 1A 4 A 4 -h> CB 

This stage can significantly increase the number of non-terminals and production rules, but it 
increases the total length of all production rules by at most a small constant factor. 2 Moreover, 
for the remainder of the conversion algorithm, every production rule has length at most two. The 
running time of this stage is O(L). 

2. Identify nullable non-terminals. A non-terminal A is nullable if and only if A ~*»* e. The 

recursive definition of ~»* implies that A is nullable if and only if the grammar contains a 
production rule A — > co where co consists entirely of nullable non-terminals (in particular, if 
co = e) . You may be tempted to transform this recursive characterization directly into a recursive 
algorithm, but this is a bad idea; the resulting algorithm would fall into an infinite loop if (for 
example) the same non-terminal appeared on both sides of the same production rule. Instead, we 
apply the following fixed-point algorithm, which repeatedly scans through the entire grammar 
until a complete scan discovers no new nullable non-terminals. 

Nullables(Z), T,R,S): 

T c <— 0 ((known nullable non-terminals)) 
done <— False 
while -idone 

done <— True 

for each non-terminal Aer\T £ 
for each production rule A — » co 
if uef 

e 

add A to T e 
done <— False 

return T e 

At this point in the conversion algorithm, if S' is nor identified as nullable, then we can safely 
remove it from the grammar and use the original starting nonterminal S instead. 

As written, Nullables runs in O(nL) = 0(L 2 ) time, where n is the number of non-terminals 
in T. Each iteration of the main loop except the last adds at least one non-terminal to r g , so the 
algorithm halts after at most n + 1 < L iterations, and in each iteration, we examine at most L 
production rules. There is a faster implementation of Nullables that runs in 0(n + L) = O(L) 
time, 3 but since other parts of the conversion algorithm already require 0(L 2 ) time, we needn't 
bother. 

3. Eliminate e -productions. First, remove every production rule of the form A— > e. Then for 
each production rule A — * w, add all possible new production rules of the form A — > w' , where w' 

2 In most textbook descriptions of this conversion algorithm, this stage is performed last, after removing e- 
productions and unit productions. But with the stages in that traditional order, removing e-productions could 
exponentially increase the length of the grammar in the worst case! Consider the production rule A — > (_BC) k , where B 
is nullable but C is not. Decomposing this rule first and then removing e-productions introduces about 3fc new rules; 
whereas, removing e-productions first introduces 2 k new rules, most of which then must then be further decomposed. 

3 Consider the bipartite graph whose vertices correspond to non- terminals and the right sides of production rules, 
with one edge per rule. The faster algorithm is a modified breadth-first search of this graph, starting at the vertex 
representing e. 
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is a non-empty string obtained from w by removing one nullable non-terminal. For example, if if 
the grammar contained the production rule A — > BC, where B and C are both nullable, we would 
add two new production rules A — > B \ C. Finally, if the starting nonterminal S' was identified as 
nullable in the previous stage, add the production rule S' — > e; this will be the only ^-production 
in the final grammar. This phase of the conversion runs in O(L) time and at most triples the 
number of production rules. 

4. Merge equivalent non-terminals. We say that two non-terminals A and B are equivalent if 

they can be derived from each other: A ~»* B and B ~»* A. Because we have already removed 
^-productions, any such derivation must consist entirely of unit productions. For example, in the 
grammar 

S^B\C, A^B\D\CC\Q, B^>C\AD\1, C^A\DA, D^>BA\CS, 

non-terminals A,B, C are all equivalent, but S is not in that equivalence class (because we cannot 
derive S from A) and neither is D (because we cannot derive A from D). 

Construct a directed graph G whose vertices are the non-terminals and whose edges correspond 
to unit productions, in O(L) time. Then two non-terminals are equivalent if and only if they are 
in the same strong component of G. Compute the strong components of G in O(L) time using, 
for example, the algorithm of Kosaraju and Sharir. Then merge all the non-terminals in each 
equivalence class into a single non-terminal. Finally, remove any unit productions of the form 
A— »A The total running time for this phase is O(L). Starting with our example grammar above, 
merging B and C with A and removing the production A — > A gives us the simpler grammar 

S^A, A—>AA\D\DA\Q\1, D — » AA | AS. 

We could further simplify the grammar by merging all non-terminals reachable from S using only 
unit productions (in this case, merging non-terminals S and S), but this further simplification is 
unnecessary. 

5. Remove unit productions. Once again, we construct a directed graph G whose vertices are 
the non-terminals and whose edges correspond to unit productions, in O(L) time. Because no 
two non-terminals are equivalent, G is acyclic. Thus, using topological sort, we can index the 
non-terminals A 1 ,A 2 , ■ ■ ■ ,A n such that for every unit production A t — > A ; - we have i < j, again in 
O(L) time; moreover, we can assume that the starting non-terminal is A 1 . (In fact, both the dag 
G and the linear ordering of non-terminals was already computed in the previous phase!!) 

Then for each index j in decreasing order, for each unit production A i — > A ; and each 
production A ; - — > co, we add a new production rule A i — > co. At this point, all unit productions are 
redundant and can be removed. Applying this algorithm to our example grammar above gives us 
the grammar 

S ^>AA\AS\DA\Q\ I, A — > AA | AS | DA | 0 | 1, D^AA\AS. 

In the worst case, each production rule for A n is copied to each of the other n—1 non- 
terminals. Thus, this phase runs in 0(nL) = 0(L 2 ) time and increases the length of the grammar 
to 6(nL) = 0(L 2 ) in the worst case. 

This phase dominates the running time of the CNF conversion algorithm. Unlike previous 
phases, no faster algorithm for removing unit transformations is known! There are grammars of 
length L with unit productions such that any equivalent grammar without unit productions has 
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length f2(L L499999 ) (for any desired number of 9s), but this lower bound does not rule out the 
possibility of an algorithm that runs in only 0(L 3 ^ 2 ) time. Closing the gap between Q(L 3 / 2_S ) 
and 0(L 2 ) has been an open problem since the early 1980s. 

6. Protect terminals. Finally for each terminal aeS, we introduce a new non-terminal A a 
and a new production rule A a — > a, and then replace a with A a in every production rule of length 
2. This completes the conversion to Chomsky normal form. As claimed, the total running time of 
the algorithm is 0(L 2 ), and the total length of the output grammar is also 0(L 2 ). 

CNF Conversion Example 

As a running example, let's apply these stages one at a time to our first example grammar. 
S^A\B A->0A|0C B->B1|C1 C -> e | 0C1 

0. Add a new starting non-terminal S' . 

S' S^A\B A->OA|0C B->B1\C1 C -» e | 0C1 

1. Decompose the long production rule C — > 0C1. 

S'-»S S^A\B A->0A|0C B->B1|C1 C -> e \ QD D -» CI 

2. Identify C as the only nullable non-terminal. Because S' is not nullable, remove the 
production rule S' — > S. 

3. Eliminate the e-production C — > e. 

S^A|B A^0A|0C|O B->B1|C1|1 C -> 0D D — » CI | 1 

4. No two non-terminals are equivalent, so there's nothing to merge. 

5. Remove the unit productions S' — > S, S — > A, and S — > B. 

S -> QA| 0C |B1 1 CI I 0 I 1 

A^0A|0C|0 B->B1|C1|1 C^OD D — » CI | 1. 

6. Finally, protect the terminals 0 and 1 to obtain the final CNF grammar. 

S -> EA I EC I BF I CF | 0 | 1 

A -> EA I EC I 0 B -> BE | CE | 1 

C->£_D D — > CE I 1 

E -> 0 E 1 
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Exercises 

1. Describe context-free grammars that generate each of the following languages. The 
function #(x, w) returns the number of occurrences of the substring x in the string w. For 
example, #(0, 101001) = 3 and #(010, 1010100011) = 2. 

(a) All strings in {0, 1}* whose length is divisible by 5. 

(b) All strings in {0, 1}* representing a non-negative multiple of 5 in binary. 

(c) {w e {0, 1}* I #(0,w) = #(l,w)} 

(d) {we {0,1}* I #(Q,w)^#(i,w)} 

(e) {w e {0, 1}* I #(00, w) = #(11, w)} 

(f) {we {0,1}* I #(01,w) = #(lO,w)} 

(g) {w e {0, 1}* I #(0,w) = #(l,w) and |w| is a multiple of 3} 

(h) {0,l}*\{0"l n I n>0} 

(i) {0"l 2n I n>0} 

0') {0,1}*\{0"1 2 ' 1 I n>0} 

(k) {0"l m |0<2m<n<3m} 

(1) {0W+; I ;,j>o} 

(m) {0 ; l-'2 fc I i =;' or j = fc} 
(n) {Q i V2 k \i£j ox j^k} 
(o) {O^'o-'V I i,j>0} 
(p) {w$0 #(G ' w) I we {0,1}*} 
(q) {xy I x,y e {0,1}* and x 7^ y and |x| = |y|} 
(r) {x$y R | x,ye{0,l}*andx^y} 
(s) {x$y I x,y e {0, 1}* and #(0,x) = #(l,y)} 
(t) {0,1}*\{ww| we {0,1}*} 
(u) All strings in {0, 1}* that are not palindromes. 

(v) All strings in {(, ),o}* in which the parentheses are balanced and the symbol o 
appears at most four times. For example, ()(()) and ( 00 (()() o )()()) o anc j 000 
are strings in this language, but ) ( ( ) and ( 000 ) 00 are not. 



2. Describe recursive automata for each of the languages in problem 1. ("Describe" does not 
necessarily mean "draw"!) 



3. Prove that if I is a context-free language, then L R is also a context-free language. [Hint: 
How do you reverse a context-free grammar?] 



4. Consider a generalization of context-free grammars that allows any regular expression over 
S U T to appear on the right side of a production rule. Without loss of generality, for each 
non-terminal A e T, the generalized grammar contains a single regular expression R(A). To 
apply a production rule to a string, we replace any non-terminal A with an arbitrary word 
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in the language described by R(A). As usual, the language of the generalized grammar is 
the set of all strings that can be derived from its start non-terminal. 

For example:, the following generalized context-free grammar describes the language 
of all regular expressions over the alphabet {0, 1}: 

S — > + 0 (Regular expressions) 

T — > £ + F*F (Terms = summable expressions) 

F — > (0 + 1 + (S ) )(* + £•) (Factors = concatenable expressions) 

Here is a parse tree for the regular expression 0+1 ( 10*1+01*0 ) *10* (which represents the 
set of all binary numbers divisible by 3) : 



S 




0 F F F F 




1 ( S ) * 1 0 * 




F F F F F F 

I A I I A 
10*1 01*0 

Prove that every generalized context-free grammar describes a context-free language. 
In other words, show that allowing regular expressions to appear in production rules does 
not increase the expressive power of context-free grammars. 
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Caveat lector: This is the zeroth (draft) edition of this lecture note. In particular, some topics 
still need to be written. Please send bug reports and suggestions to jeffe@illinois.edu. 



Think globally, act locally. 

— Attributed to Patrick Geddes (c.1915), among many others. 

We can only see a short distance ahead, 

but we can see plenty there that needs to be done. 

— Alan Turing, "Computing Machinery and Intelligence" (1950) 

Never worry about theory as long as the machinery does what it's supposed 
to do. 

— Robert Anson Heinlein, Waldo & Magic, Inc. (1950) 

6 Turing Machines 

In 1936, a few months before his 24th birthday, Alan Turing launched computer science as a 
modern intellectual discipline. In a single remarkable paper, Turing provided the following 
results: 

• A simple formal model of mechanical computation now known as Turing machines. 

• A description of a single universal machine that can be used to compute any function 
computable by any other Turing machine. 

• A proof that no Turing machine can solve the halting problem — Given the formal description 
of an arbitrary Turing machine M, does M halt or run forever? 

• A proof that no Turing machine can determine whether an arbitrary given proposition 
is provable from the axioms of first-order logic. This Hilbert and Ackermann's famous 
Entscheidungsproblem ("decision problem") 

• Compelling arguments 1 that his machines can execute arbitrary "calculation by finite 
means". 

Turing's paper was not the first to prove that the Entscheidungsproblem had no algorithmic 
solution. Alonzo Church published the first proof just a new months earlier, using a very different 
model of computation, now called the untyped X-calculus. Turing and Church developed their 
results independently; indeed, Turing rushed the submission of his own paper immediately 
after receiving a copy of Church's paper, pausing only long enough to prove that any function 
computable via A-calculus can also be computed by a Turing machine and vice versa. Church 
was the referee for Turing's paper; between the paper's submission and its acceptance, Turing 
was admitted to Princeton, where he became Church's PhD student. He finished his PhD two 
years later. 

Informally, Turing described a device with a finite number of internal states that has access 
to memory in the form of a tape. The tape consists of a semi- infinite sequence of cells, each 

1 As Turing put it, 'All arguments which can be given are bound to be, fundamentally, appeals to intuition, and for 
this reason rather unsatisfactory mathematically." The claim that anything that can be computed can be computing 
using Turing machines is now known as the Church-Turing thesis. 

© Copyright 2014 Jeff Erickson. 
This work is licensed under a Creative Commons License (http://creativecommons.0rg/licenses/by-nc-sa/4.O/). 
Free distribution is strongly encouraged; commercial distribution is expressly forbidden. 
See http://www.cs.uiuc.edu/-jeffe/teaching/algorithms/ for the most recent revision. 
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containing a single symbol from some arbitrary finite alphabet. The Turing machine can access 
the tape only through its head, which is positioned over a single cell. Initially, the tape contains 
an arbitrary finite input string followed by an infinite sequence of blanks, and the head is 
positioned over the first cell on the tape. In a single iteration, the machine reads the symbol in 
that cell, possibly write a new symbol into that cell, possibly changes its internal state, possibly 
moves the head to a neighboring cell, and possibly halts. The precise behavior of the machine at 
each iteration is entirely determined by its internal state and the symbol that it reads. When the 
machine halts, it indicates whether it has accepted or rejected the original input string. 
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A few iterations of a six-state Turing machine. 



6.1 Why Bother? 

Students used to thinking of computation in terms of higher-level operations like random memory 
accesses, function calls, and recursion may wonder why we should even consider a model as 
simple and constrained as Turing machines. Admittedly, Turing machines are a terrible model 
for thinking about fast computation; simple operations that take constant time in the standard 
random-access model can require arbitrarily many steps on a Turing machine. Worse, seemingly 
minor variations in the precise definition of "Turing machine" can have significant impact on 
problem complexity. As a simple example (which will make more sense later), we can reverse 
a string of n bits in O(n) time using a two-tape Turing machine, but the same task provably 
requires f2(n 2 ) time on a single-tape machine. 

But here we are not interested in finding fast algorithms, or indeed in finding algorithms 
at all, but rather in proving that some problems cannot be solved by any computational means. 
Such a bold claim requires a formal definition of "computation" that is simple enough to support 
formal argument, but still powerful enough to describe arbitrary algorithms. Turing machines 
are ideal for this purpose. In particular, Turing machines are powerful enough to simulate other 
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Turing machines, while still simple enough to let us build up this self-simulation from scratch, 
unlike more complex but efficient models like the standard random-access machine 

(Arguably, self-simulation is even simpler in Church's A-calculus, or in Schonfinkel and 
Curry's combinator calculus, which is one of many reasons those models are more common in 
the design and analysis of programming languages than Turing machines. Those models much 
more abstract; in particular, they are harder to show equivalent to standard iterative models of 
computation.) 

6.2 Formal Definitions 

Formally, a Turing machine consists of the following components. (Hang on; it's a long list.) 

• An arbitrary finite set T with at least two elements, called the tape alphabet. 

• An arbitrary symbol □ e T, called the blank symbol or just the blank. 

• An arbitrary nonempty subset S c (r \ {□}), called the input alphabet. 

• Another arbitrary finite set Q whose elements are called states. 

• Three distinct special states start, accept, reject e Q. 

• A transition function 5 : (Q \ {accept, reject}) x r — > Q x r x {—1, +1}. 

A configuration or global state of a Turing machine is represented by a triple (q,x, i) e 
Q x T* x N, indicating that the machine's internal state is q, the tape contains the string x followed 
by an infinite sequence of blanks, and the head is located at position i. Trailing blanks in the 
tape string are ignored; the triples (q,x,i) and (q,xd, i) describe exactly the same configuration. 

The transition function 5 describes the evolution of the machine. For example, 5(q, a) = 
(p, b, — 1) means that when the machine reads symbol a in state q, it changes its internal state 
to p, writes symbol b onto the tape at its current location (replacing a), and then decreases its 
position by 1 (or more intuitively, moves one step to the left). If the position of the head becomes 
negative, no further transitions are possible, and the machine crashes. 

We write (p, x, i) => M (q,y, j) to indicate that Turing machine M transitions from the first 
configuration to the second in one step. (The symbol => is often pronounced "yields"; I will omit 
the subscript M if the machine is clear from context.) For example, <5(p, a) = (q, b, ±1) means 
that 

(p,xay,i) => (q,xby,i±l) 

for any non-negative integer i, any string x of length i, and any string y . The evolution of any 
Turing machine is deterministic; each configuration C yields a unique configuration C' . We write 
C =>* C to indicate that there is a (possibly empty) sequence of transitions from configuration C 
to configuration C' . (The symbol =>* can be pronounced "eventually yields".) 

The initial configuration is (w, start, 0) for some arbitrary (and possibly empty) input string 
w e £*. If M eventually reaches the accept state — more formally, if (w, start, 0) =>* (x, accept, i) 
for some string x € T* and some integer t — we say that M accepts the original input string w. 
Similarly, if M eventually reaches the reject state, we say that M rejects w. We must emphasize 
that "rejects" and "does not accept" are not synonyms; if M crashes or runs forever, then M 
neither accepts nor rejects w. 

We distinguish between two different senses in which a Turing machine can "accept" a 
language. Let M be a Turing machine with input alphabet S, and let L c E* be an arbitrary 
language over S. 
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• M recognizes or accepts L if and only if M accepts every string in L but nothing else. A 
language is recognizable (or semi-computable or recursively enumerable) if it is recognized 
by some Turing machine. 

• M decides L if and only if M accepts every string in L and rejects every string in S* \ L. 
Equivalently, M decides L if and only if M recognizes L and halts (without crashing) on all 
inputs. A language is decidable (or computable or recursive) if it is decided by some Turing 
machine. 

Trivially, every decidable language is recognizable, but (as we will see later), not every recognizable 
language is decidable. 

6.3 A First Example 

Consider the language L = {0 n l"0" | n > 0}. This language is neither regular nor context-free, 
but it can be decided by the following six-state Turing machine. The alphabets and states of the 
machine are defined as follows: 

T = {0, 1, $,x,n} 
£ = {0,1} 

Q = {start, seekl, seekO, reset, verify, accept, reject} 

The transition function is described in the following table; all unspecified transitions lead to the 
reject state. We also give a graphical representation of the same machine, which resembles a 
drawing of a DFA, but with output symbols and actions specified on each edge. For example, we 
indicate the transition 5(p, 0) = (q, 1, +1) by writing 0/1, +1 next to the arrow from state p to 
state q. 
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; success! 



The transition function for a Turing machine that decides the language {0"1"O" | n > 0}. 



Finally, we trace the execution of this machine on two input strings: 001100 e L and 
00100 j£ L. In each configuration, we indicate the position of the head using a small triangle 
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A graphical representation of the example Turing machine 



instead of listing the position explicitly. Notice that we automatically add blanks to the tape 
string as necessary. Proving that this machine actually decides L — and in particular, that it never 
crashes or infinite-loops — is a straightforward but tedious exercise in induction. 



(start, 001100) => (seekl, $01100) => (seekl, $01100) => (seekO, $0x100) => (seekO, $0x100) 
=> (reset, $0x1x0) => (reset, $0x1x0) => (reset, $0x1x0) => (reset, |0xlx0) 
=> (start, $0x1x0) 

=> (seekl, $$xlx0) => (seekl, $$xlx0) => (seekO, $$xxx0) => (seekO, $$xxx0) 
=> (reset, $$xxxx) => (reset, $$xxxx) => (reset, $$xxxx) => (reset, $|xxxx) 
=> (verify, $$xxxx) => (verify, $$$xxx) => (verify, $$$$xx) 
=> (verify, $$$$$x) => (verify, $$$$$$□) => (accept, $$$$$|) => accept! 

The evolution of the example Turing machine on the input string 001100 e L 



(start, 00100) => (seekl, $0100) => (seekl, $0100) => (seekO, $0x00) 
=> (reset, $0xx0) => (reset, $Oxx0) => (reset, |0xxO) 
=> (start, $0xx0) 

=> (seekl, $$xx0) => (seekl, $$xx0) => (seekl, $$xx0) => reject! 
The evolution of the example Turing machine on the input string 00100 ^ L 



6.4 Variations 

There are actually several formal models that all fall under the name "Turing machine", each 
with small variations on the definition we've given. Although we do need to be explicit about 
which variant we want to use for any particular problem, the differences between the variants are 
relatively unimportant. For any machine defined in one model, there is an equivalent machine in 
each of the other models; in particular, all of these variants recognize the same languages and 
decide the same languages. For example: 
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• Halting conditions. Some models allow multiple accept and reject states, which (depend- 
ing on the precise model) trigger acceptance or rejection either when the machine enters 
the state, or when the machine has no valid transitions out of such a state. Others include 
only explicit accept states, and either equate crashing with rejection or do not define a 
rejection mechanism at all. Still other models include halting as one of the possible actions 
of the machine, in addition to moving left or moving right; in these models, the machine 
accepts/rejects its input if and only if it halts in an accepting/non-accepting state. 

• Actions. Some Turing machine models allow transitions that do not move the head, or 
that move the head by more than one cell in a single step. Others insist that a single step of 
the machine either writes a new symbol onto the tape or moves the head one step. Finally, 
as mentioned above, some models include halting as one of the available actions. 

• Transition function. Some models of Turing machines, including Turing's original 
definition, allow the transition function to be undefined on some state-symbol pairs. In this 
formulation, the transition function is given byasetScQxTxQxrx {+1, —1}, such 
that for each state q and symbol a, there is at most one transition (q, a, -,-,•) £ 5. If the 
machine enters a configuration from which there is no transition, it halts and (depending 
on the precise model) either crashes or rejects. Others define the transition function as 
5: QxT — >Qx(ru {—1, +1}), allowing the machine to either write a symbol to the tape 
or move the head in each step. 

• Beginning of the tape. Some models forbid the head to move past the beginning of the 
tape, either by starting the tape with a special symbol that cannot be overwritten and 
that forces a rightward transition, or by declaring that a leftward transition at position 0 
leaves the head in position 0, or even by pure fiat — declaring any machine that performs a 
leftward move at position 0 to be invalid. 

To prove that any two of these variant "species" of Turing machine are equivalent, we must 
show how to transform a machine of one species into a machine of the other species that accepts 
and rejects the same strings. For example, let M = (r, □, S, Q,s, accept, reject, 5) be a Turing 
machine with explicit accept and reject states. We can define an equivalent Turing machine 
M' that halts only when it moves left from position 0, and accepts only by halting while in an 
accepting state, as follows. We define the set of accepting states for M' as A = {accept} and 
define a new transition function 



Similarly, suppose someone gives us a Turing machine M = (r, □, E,Q,s, accept, reject, 5) 
whose transition function 5: QxT — >QxTx {—1, 0, +1} allows the machine to transition without 
moving its head. We can construct an equivalent Turing machine M 1 = (r, □, S, Q',s, accept, reject, 5') 
that moves its head at every transition by defining Q' := Q x {0, 1} and 




(accept, a, —1) if q = accept 
(reject, a, — 1) if q = reject 
5(q, a) otherwise 



f((q,l),b,+l) 

{((q,0),b,A) 

((p,0),a,-l). 



if5(p,a) = (q,b,0), 

if 5(p, a) = (q, b, A) and A/0, 
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6.5 Computing Functions 

Turing machines can also be used to compute functions from strings to strings, instead of just 
accepting or rejecting strings. Since we don't care about acceptance or rejection, we replace 
the explicit accept and reject states with a single halt state, and we define the output of the 
Turing machine to be the contents of the tape when the machine halts, after removing the 
infinite sequence of trailing blanks. More formally, for any Turing machine M, any string w e £*, 
and any string x e r* that does not end with a blank, we write M(w) = x if and only if 
(w,5, 0) =>* M (x, halt, i) for some integer i. If M does not halt on input w, then we write M(w) f, 
which can be read either "M diverges on w" or "M(w) is undefined." We say that M computes 
the function / : S* — > E* if and only if M(w) = _f (w) for every string w. 

6.5.1 Shifting 

One basic operation that is used in many Turing machine constructions is shifting the input 
string a constant number of steps to the right or to the left. For example, given any input 
string w e {0, 1}*, we can compute the string Ow using a Turing machine with tape alphabet 
T = {0, 1, □}, state set Q = {0, 1, halt}, start state 0, and the following transition function: 

5(p, a) = ( q , b, A) 
5(0, 0) = ( 0 , 0, +1) 
5(0, 1) = ( 1 , 0, +1) 
5(0, □) = (halt, 0, +1) 
5(1, 0) - (0 .1.-1) 
5(1, 1) = ( 1 , 1, +1) 
5(1, □) = (halt, 1, +1) 

By increasing the number of states, we can build a Turing machine that shifts the input string any 
fixed number of steps in either direction. For example, a machine that shifts its input to the left 
by five steps might read the string from right to left, storing the five most recently read symbols in 
its internal state. A typical transition for such a machine would be 5(12345, 0) = (01234, 5, — 1). 

6.5.2 Binary Addition 

With a more complex Turing machine, we can implement binary addition. The input is a string of 
the form w+x, where w, x e {0, 1}", representing two numbers in binary; the output is the binary 
representation of w+x. To simplify our presentation, we assume that \w\ = |x| > 0; however, this 
restrictions can be removed with the addition of a few more states. The following figure shows 
the entire Turing machine at a glance. The machine uses the tape alphabet T = {□, 0, 1, +, 0, 1}; 
the start state is shiftO. All missing transitions go to a fail state, indicating that the input was 
badly formed. 

Execution of this Turing machine proceeds in several phases, each with its own subset of 
states, as indicated in the figure. The initialization phase scans the entire input, shifting it to 
the right to make room for the output string, marking the rightmost bit of w, and reading and 
erasing the last bit of x. 
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scan left 
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scan right 
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A Turing machine that adds two binary numbers of the same length. 



5( p , a) = ( g , b, A) 

5(shift0, 0) = (shiftO, 0, +1) 

5(shift0, 1) = (shiftl, 0, +1) 

5(shift0, +) = (shift+, 0, +1) 

5(shift0, □) = (addO, □, -1) 

5( shiftl , 0) • ( s h i ft 0 , 1 , +1) 

5(shiftl, 1) = (shiftl, 1, +1) 

<5(shiftl, +) = (shift+, 1, +1) 

5(shiftl, □) = (addl, □, -1) 

( s h i ft f , 0 ) ( s hi i ft 0 , 1 , +1 ) 

5(shift+, 1) = (shiftl, +, +1) 

The first part of the main loop scans left to the marked bit of w, adds the bit of x that was 
just erased plus the carry bit from the previous iteration, and records the carry bit for the next 
iteration in the machines internal state. 
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The second part of the main loop marks the previous bit of w, scans right to the end of x, and 
then reads and erases the last bit of x, all while maintaining the carry bit. 
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5( p , a) = ( q , b, A) 

5(last0, 0) = (lastO, 0, -1) 

5(last0, 1) = (lastO, 0, -1) 

5(last0, 0) = ( halt, 0, ) 

<5(iastl7 0) i"= (iastl, 0, — 1) 

5(lastl, 1) = (lastl, 0, -1) 

5(lastl, 0) = ( halt, 1, ) 
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5( p , a) = ( q , b, A) 5( p , a) = ( q , b, A) 
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Finally, after erasing the + in the last iteration of the main loop, the termination phase adds the 
last carry bit to the leftmost output bit and halts. 



6.6 Variations on Tracks, Heads, and Tapes 
Multiple Tracks 

It is sometimes convenient to endow the Turing machine tape with multiple tracks, each with its 
own tape alphabet, and allow the machine to read from and write to the same position on all 
tracks simultaneously. For example, to define a Turing machine with three tracks, we need three 
tape alphabets I\, T 2 , and r 3 , each with its own blank symbol, where (say) T x contains the input 
alphabet E as a subset; we also need a transition function of the form 

5:Qxr 1 xr 2 xr 3 ^Qxr 1 xr 2 xr 3 x {-1, +1} 

Describing a configuration of this machine requires a quintuple (q, x 1 ,x 2 ,x 3 ,i), indicating that 
each track i contains the string x t followed by an infinite sequence of blanks. The initial 
configuration is (start, w, e, e, 0), with the input string written on the first track, and the other 
two tracks completely blank. 

But any such machine is equivalent (if not identical) to a single-track Turing machine with 
the (still finite!) tape alphabet r := Tj x r 2 x r 3 . Instead of thinking of the tape as three infinite 
sequences of symbols, we think of it as a single infinite sequence of "records", each containing 
three symbols. Moreover, there's nothing special about the number 3 in this construction; a 
Turing machine with any constant number of tracks is equivalent to a single-track machine. 

Doubly-Infinite Tape 

It is also sometimes convenient to allow the tape to be infinite in both directions, for example, 
to avoid boundary conditions. There are several ways to simulate a doubly-infinite tape on a 
machine with only a semi-infinite tape. Perhaps the simplest method is to use a semi-infinite tape 
with two tracks, one containing the cells with positive index and the other containing the cells 
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with negative index in reverse order, with a special marker symbol at position zero to indicate 
the transition. 



0 : +1 


+2 


+3 


+4 . ... 


► j -1 


-2 


-3 


-4 ••• 



Another method is to shuffle the positive-index and negative-index cells onto a single track, 
and add additional states to allow the Turing machine to move two steps in a single transition. 
Again, we need a special symbol at the left end of the tape to indicate the transition: 



0 i -1 ; +1 ; -2 i +2 i -3 ; +3 



A third method maintains two sentinel symbols ► and M that surround all other non-blank 
symbols on the tape. Whenever the machine reads the right sentinel M, we write a blank, move 
right, write move left, and then proceed as if we had just read a blank. On the other hand, 
when the machine reads the left sentinel ►, we shift the entire contents of the tape (up to and 
including the right sentinel) one step to the right, then move back to the left sentinel, move right, 
write a blank, and finally proceed as if we had just read a blank. Since the Turing machine does 
not actually have access to the position of the head as an integer, shifting the head and the tape 
contents one step right has no effect on its future evolution. 



0 : +1 ; +2 i +3 i +4 i +5 



Using either of the first two methods, we can simulate t steps of an arbitrary Turing machine 
with a doubly- infinite tape using only O(t) steps on a standard Turing machine. The third 
method, unfortunately, requires 9(t 2 ) steps in the worst case. 



Insertion and Deletion 

We can also allow Turing machines to insert and delete cells on the tape, in addition to simply 
overwriting existing symbols. We've already seen how to insert a new cell: Leave a special mark 
on the tape (perhaps in a second track), shift everything to the right of this mark one cell to the 
right, scan left to the mark, erase the mark, and finally write the correct character into the new 
cell. Deletion is similar: Mark the cell to be deleted, shift everything to the right of the mark one 
step to the left, scan left to the mark, and erase the mark. We may also need to maintain a mark 
in some cell to the right every non-blank symbol, indicating that all cells further to the right are 
blank, so that we know when to stop shifting left or right. 



Multiple Heads 

Another convenient extension is to allow machines simultaneous access to more than one position 
on the tape. For example, to define a Turing machine with three heads, we need a transition 
function of the form 

5:QxT 3 ^Qxr 3 x {-1,+1} 3 . 

Describing a configuration of such a machine requires a quintuple (q, x, i,j, k), indicating that the 
machine is in state q, the tape contains string x, and the three heads are at positions i, j, k. The 
transition function tells us, given q and the three symbols x[i],x[;],x[?c], which three symbols 
to write on the tape and which direction to move each of the heads. 

We can simulate this behavior with a single head by adding additional tracks to the tape 
that record the positions of each head. To simulate a machine M with three heads, we use a 
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tape with four tracks: track 0 is the actual work tape; each of the remaining tracks has a single 
non-blank symbol recording the position of one of the heads. We also insert a special marker 
symbols at the left end of the tape. 



► 


M Y W 0 R 


K T A 


P E 




► 






▲ i 




► 


i A i 








► 




▲ : 







We can simulate any single transition of M, starting with our single head at the left end of 
the tape, as follows. Throughout the simulation, we maintain the internal state of M as one of 
the components of our current state. First, for each i, we read the symbol under the tth head of 
M as follows: 

Scan to the right to find the mark on track i, read the corresponding symbol from 
track 0 into our internal state, and then return to the left end of the tape. 

At this point, our internal state records M's current internal state and the three symbols under 
M's heads. After one more transition (using M's transition function), our internal state records 
M's next state, the symbol to be written by each head, and the direction to move each head. 
Then, for each i, we write with and move the tth head of M as follows: 

Scan to the right to find the mark on track i, write the correct symbol onto on track 
0, move the mark on track i one step left or right, and then return to the left end of 
the tape. 

Again, there is nothing special about the number 3 here; we can simulate machines with any 
fixed number of heads. 

Careful analysis of this technique implies that for any integer k, we can simulate t steps 
of an arbitrary Turing machine with k independent heads in 0(t 2 ) time on a standard Turing 
machine with only one head. Unfortunately, this quadratic blowup is unavoidable. It is relatively 
easy to recognize the language of marked palindromes {ww^ \ w e {0, 1}*} in O(n) time using 
a Turing machine with two heads, but recognizing this language provably requires fi(n 2 ) time 
on a standard machine with only one head. On the other hand, with much more sophisticated 
techniques, it is possible to simulate t steps of a Turing machine with k head, for any fixed 
integer k, using only 0(t log t) steps on a Turing machine with just two heads. 

Multiple Tapes 

We can also allow machines with multiple independent tapes, each with its own head. To 
simulate such a machine with a single tape, we simply maintain each tape as an independent 
track with its own head. Equivalently we can simulate a machine with k tapes using a single 
tape with 2k tracks, half storing the contents of the k tapes and half storing the positions of the k 
heads. 



► 


T A P ; E ; # 


0 ; N ; E 






► 






A i 




► 


T A P E # 


T i W i 0 






► 


A : 








► 


TAPE* 


T H R 


E E 




► 




A 
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Just as for multiple tracks, for any constant k, we can simulate t steps of an arbitrary Turing 
machine with k independent tapes in 0(t 2 ) steps on a standard Turing machine with one tape, 
and this quadratic blowup is unavoidable. Moreover, it is possible to simulate t steps on a 
fc-tape Turing machine using only 0(t log t) steps on a two-tape Turing machine using more 
sophisticated techniques. (This faster simulation is easier to obtain for multiple independent 
tapes than for multiple heads on the same tape.) 

By combining these tricks, we can simulate a Turing machine with any fixed number of tapes, 
each of which may be infinite in one or both directions, each with any fixed number of heads and 
any fixed number of tracks, with at most a quadratic blowup in the running time. 



6.7 Simulating a Real Computer 



6.7.1 Subroutines and Recursion 



*** 



Use a second tape/track as a "call stack". Add save and restore actions. In the simplest 
formulation, subroutines do not have local memory. To call a subroutine, save the current 
state onto the call stack and jump to the first state of the subroutine. To return, restore (and 
remove) the return state from the call stack. We can simulate t steps of any recursive Turing 
machine with O(t) steps on a multitape standard Turing machine, or in 0(t 2 ) steps on a 
standard Turing machine. 

More complex versions of this simulation can adapt to 



*** 



6.7.2 Random-Access Memory 



Keep [address»data] pairs on a separate "memory" tape. Write address to an "address" 
tape; read data from or write data to a "data" tape. Add new or changed [address»data] 
pairs at the end of the memory tape. (Semantics of reading from an address that has never 
been written to?) 

Suppose all memory accesses require at most I address and data bits. Then we can 
simulate the fcth memory access in 0(kl) steps on a multitape Turing machine or in 0(k 2 £ 2 ) 
steps on a single-tape machine. Thus, simulating t memory accesses in a random-access 
machine with £-bit words requires 0{t 2 l) time on a multitape Turing machine, or 0(t 3 £ 2 ) time 
on a single-tape machine. 



6.8 Universal Turing Machines 

With all these tools in hand, we can now describe the pinnacle of Turing machine constructions: 
the universal Turing machine. For modern computer scientists, it's useful to think of a universal 
Turing machine as a "Turing machine interpreter written in Turing machine". Just as the input to 
a Python interpreter is a string of Python source code, the input to our universal Turing machine 
U is a string (M, w) that encodes both an arbitrary Turing machine M and a string w in the input 
alphabet of M. Given these encodings, U simulates the execution of M on input w; in particular, 

• U accepts (M, w) if and only if M accepts w. 

• U rejects (M, w) if and only if M rejects w. 

In the next few pages, I will sketch a universal Turing machine U that uses the input alphabet 
{0, 1, [,],», I } and a somewhat larger tape alphabet (via marks on additional tracks). However, 
I do not require that the Turing machines that U simulates have similarly small alphabets, so we 
first need a method to encode arbitrary input and tape alphabets. 
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Encodings 

Let M = (r, □, £, Q, start, accept, reject, 5) be an arbitrary Turing machine, with a single half- 
infinite tape and a single read-write head. (I will consistently indicate the states and tape symbols 
of M in slanted green to distinguish them from the upright red states and tape symbols of [/.) 

We encode each symbol a G T as a unique string |a| of rig(|r|)l bits. Thus, if T = {0, 1, $,x, □}, 
we might use the following encoding: 

(0) = 001, (l) = 010, ($) = 011, (x) = 100, (n) = OO0. 

The input string w is encoded by its sequence of symbol encodings, with separators • between 
every pair of symbols and with brackets [ and ] around the whole string. For example, with this 
encoding, the input string 001 100 would be encoded on the input tape as 

(001100) = [0O1*OO1*O10*O10*001*O01] 

Similarly, we encode each state q e Q as a distinct string (q) of TlglQH bits. Without loss of 
generality, we encode the start state with all Is and the reject state with all 0s. For example, if 
Q = {start, seekl,seekO, reset, verify, accept, reject}, we might use the following encoding: 

(start) = 111 (seekl) = QIQ (seekO) = Oil (reset) = 100 

(verify) = 101 (accept) = 110 (reject) = 000 

We encode the machine M itself as the string (M) = [(reject) •(□)] (5), where (5) is the 
concatenation of substrings [ (p) • (a) \ (q) • (b) • (A) ] encoding each transition 5(p, a) = (q, b, A) 
such that q 7^ reject. We encode the actions A = ±1 by defining (—1) := 0 and (+1) := 1. 
Conveniently, every transition string has exactly the same length. For example, with the symbol 
and state encodings described above, the transition 5(reset, $) = (start, $, +1) would be encoded 
as 

[100»011|0O1»011»1]. 

Our first example Turing machine for recognizing {0 n l n 0 n | n > 0} would be represented by 
the following string (here broken into multiple lines for readability) : 

[000*000] [[001*001 1 010«011«1] [001*100 1 101»011«1] 
[010*001 1 010»001»1] [010*100 1 010* 100*1] 
[010*010 1 011*100*1] [011*010 1 011*010*1] 
[011*100 1 011*100*1] [011*001| 100*100*1] 
[100*001| 100*001*0] [100*010| 100*010*0] 
[100*100| 100*100*0] [100*011 1 001*011*1] 
[101*100| 101*011*1] [101*000| 110*000*0] ] 

Finally, we encode any configuration of M on U's work tape by alternating between encodings 
of states and encodings of tape symbols. Thus, each tape cell is represented by the string 
[(q)*(a)] indicating that (1) the cell contains symbol a; (2) if q 7^ reject, then M's head is 
located at this cell, and M is in state q; and (3) if q = reject, then M's head is located somewhere 
else. Conveniently, each cell encoding uses exactly the same number of bits. We also surround 
the entire tape encoding with brackets [ and ] . 

For example, with the encodings described above, the initial configuration (start, 001 100, 0) 
for our first example Turing machine would be encoded on U's tape as follows. 




start 0 reject 0 reject 1 reject 1 reject 0 reject 0 
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Similarly, the intermediate configuration (reset, $0x1x0, 3) would be encoded as follows: 




reject $ reject 0 reject x reset 1 reject x reject 0 



Input and Execution 

Without loss of generality, we assume that the input to our universal Turing machine U is given 
on a separate read-only input tape, as the encoding of an arbitrary Turing machine M followed 
by an encoding of its input string x. Notice the substrings [ [ and ] ] each appear only only once 
on the input tape, immediately before and after the encoded transition table, respectively. U also 
has a read-write work tape, which is initially blank. 

We start by initializing the work tape with the encoding (start, x, 0) of the initial configuration 
of M with input x. First, we write [ [(start)*. Then we copy the encoded input string (x) onto 
the work tape, but we change the punctuation as follows: 

• Instead of copying the left bracket [, write [[(start)*. 

• Instead of copying each separator •, write ] [(reject)* 

• Instead of copying the right bracket ] , write two right brackets ] ] . 

The state encodings (start) and (reject) can be copied directly from the beginning of (M) 
(replacing 0s for Is for (start)). Finally, we move the head back to the start of U's tape. 

At the start of each step of the simulation, U's head is located at the start of the work tape. 
We scan through the work tape to the unique encoded cell [(p)*(a)] such that p 7^ reject. 
Then we scan through the encoded transition function (5) to find the unique encoded tuple 
[ (p) • (a) I (q) • (b) • (A) ] whose left half matches our the encoded tape cell. If there is no such 
tuple, then U immediately halts and rejects. Otherwise, we copy the right half (q) • (b) of the 
tuple to the work tape. Now if q = accept, then U immediately halts and accepts. (We don't 
bother to encode reject transformations, so we know that q 7^ reject.) Otherwise, we transfer 
the state encoding to either the next or previous encoded cell, as indicated by M's transition 
function, and then continue with the next step of the simulation. 

During the final state-copying phase, we ever read two right brackets ] ] , indicating that 
we have reached the right end of the tape encoding, we replace the second right bracket with 
[ (reject) • (□) ] ] (mostly copied from the beginning of the machine encoding (M)) and then scan 
back to the left bracket we just wrote. This trick allows our universal machine to pretend that its 
tape contains an infinite sequence of encoded blanks [ (reject) • (□) ] instead of actual blanks □. 

Example 

As an illustrative example, suppose U is simulating our first example Turing machine M on 
the input string 001100. The execution of M on input w eventually reaches the configuration 
(seekl,$$xlxO, 3). At the start of the corresponding step in U's simulation, U is in the following 
configuration: 

{[000*011] [000*011] [000*100] [010*010] [000*100] [000*001]] 

First U scans for the first encoded tape cell whose state is not reject. That is, U repeatedly 
compares the first half of each encoded state cell on the work tape with the prefix [ (reject) • of 
the machine encoding (M) on the input tape. U finds a match in the fourth encoded cell. 

[[000*011] [000*011] [000*100] [OIO^OIO] [000*100] [000*001]] 
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Next, U scans the machine encoding (M) for the substring [010*010 matching the current 
encoded cell. U eventually finds a match in the left size of the the encoded transition 
[O10*O10|011*1O0*1]. U copies the state-symbol pair 011*100 from the right half of this 
encoded transition into the current encoded cell. (The underline indicates which symbols are 
changed.) 

[[000*011] [000*011] [000*100] [011^10O^[O00*1O0] [000*001]] 

The encoded transition instructs U to move the current state encoding one cell to the right. (The 
underline indicates which symbols are changed.) 

[[000*011] [000*011] [000*100] [000*100] [011^100] [000*001]] 

Finally U scans left until it reads two left brackets [ [ ; this returns the head to the left end of 
the work tape to start the next step in the simulation. U 7 s tape now holds the encoding of M's 
configuration (seekO, $$xxxQ, 4), as required. 

{[000*011] [000*011] [000*100] [000*100] [011*100] [000*001]] 

Exercises 

1. Describe Turing machines that decide each of the following languages: 

(a) Palindromes over the alphabet {0, 1} 

(b) {ww I we {0,1}*} 

(c) {o a l b 0 ah |a,beN} 

2. Let (n) 2 denote the binary representation of the non-negative integer n. For example, 
(17)2 = 10001 and (42) 2 = 101010. Describe Turing machines that compute the following 
functions from {0, 1}* to {0, 1}*: 

(a) w i-» www 

(b) i"oi m ^i m " 

(c) l n ^l 2 " 

(d) l n ^(n) 2 

(e) 0*(n) 2 ^l" 

(f) (n) 2 * (n 2 ) 2 

3. Describe Turing machines that write each of the following infinite streams of bits onto 
their tape. Specifically, for each integer n, there must be a finite time after which the first 
n symbols on the tape always match the first n symbols in the target stream. 

(a) An infinite stream of Is 

(b) 0101101110111101111101111110..., where the nth block of Is has length n. 

(c) The stream of bits whose nth bit is 1 if and only if n is prime. 
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(d) The Thue-Morse sequence T 0 • 7i • T 2 • T 3 • • •, where 

if n = 0 
if n = 1 
T n _i • T n _ x otherwise 

where w indicates the binary string obtained from w by flipping every bit. Equivalently 
the nth bit of the Thue Morse sequence if 0 if the binary representation of n has an 
even number of Is and 1 otherwise. 

0 1 10 1001 10010110 1001011001101001 1001011001101001011010010 . . . 



*** 



(e) The Fibonacci sequence F 0 • F 1 • F 2 • F 3 ■■ •, where 

if n = 0 
if n = 1 
F n _ 2 • F n _i otherwise 

0101 101 01101 10101101 0110110101101 101011010110110101101 . . . 

A two-stack machine is a Turing machine with two tapes with the following restricted 
behavior. At all times, on each tape, every cell to the right of the head is blank, and every 
cell at or to the left of the head is non-blank. Thus, a head can only move right by writing 
a non-blank symbol into a blank cell; symmetrically, a head can only move left by erasing 
the rightmost non-blank cell. Thus, each tape behaves like a stack. To avoid underflow, 
there is a special symbol at the start of each tape that cannot be overwritten. Initially, one 
tape contains the input string, with the head at its last symbol, and the other tape is empty 
(except for the start-of-tape symbol). 

Prove formally that any standard Turing machine can be simulated by a two-stack 
machine. That is, given any standard Turing machine M, describe a two-stack machine 
M' that accepts and rejects exactly the same input strings as M. 



Counter machines. Configuration consists of k rational numbers and an internal state (from 
some finite set Q). Transition function 5:Qx{= 0,>0}'^Qx {— l,0, + l} fc takes internal 
state and signs of counters as input, and produces new internal state and changes to counters 
as output. 

• Prove that any Turing machine can be simulated by a three-counter machine. One 
counter holds the binary representation of the tape after the head; another counter 
holds the reversed binary representation of the tape before the head. Implement 
transitions via halving, doubling, and parity, using the third counter for scratch work. 

• Prove that two counters can simulate three. Store 2 a 3 h 5 c in one counter, use the other 
for scratch work. 

• Prove that a three-counter machine can compute any computable function: Given input 
(n, 0, 0), we can compute (/(n), 0, 0) for any computable function /. First transform 
(n, 0, 0) to (2", 0, 0) using all three counters; then run two- (or three-)counter TM sim- 
ulation to obtain (2 /(n) ,0,0); and finally transform (2 /w ,0,0) to (/(n),0,0) using all 
three counters. 

• HARD: Prove that a two-counter machine cannot transform (n,0) to (2",0). [Barzdin' 
1963, Yao 1971, Schropel 1972, Ibarra+Tran 1993] 
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FRACTRAN [Conway 1987]: One-counter machine whose "program" is a sequence of 
rational numbers. The counter is initially 1. At each iteration, multiply the counter by the first 
rational number that yields an integer; if there is no such number, halt. 

• Prove that for any computable function / : N — > N, there is a FRACTRAN program that 
transforms 2" +1 into 3^^ +1 , for all natural numbers n. 

• Prove that every FRACTRAN program, given the integer 1 as input, either outputs 1 or 
loops forever. It follows that there is no FRACTRAN program for the increment function 
n >-» n+ 1. 



5. A tag-Turing machine has two heads: one can only read, the other can only write. Initially, 
the read head is located at the left end of the tape, and the write head is located at the 
first blank after the input string. At each transition, the read head can either move one cell 
to the right or stay put, but the write head must write a symbol to its current cell and move 
one cell to the right. Neither head can ever move to the left. 

Prove that any standard Turing machine can be simulated by a tag-Turing machine. 
That is, given any standard Turing machine M, describe a tag-Turing machine M' that 
accepts and rejects exactly the same input strings as M. 

6. *(a) Prove that any standard Turing machine can be simulated by a Turing machine with 

only three states. [Hint: Use the tape to store an encoding of the state of the machine 
yours is simulating.] 

*(b) Prove that any standard Turing machine can be simulated by a Turing machine with 
only two states. 

7. A two-dimensional Turing machine uses an infinite two-dimensional grid of cells as 
the tape; at each transition, the head can move from its current cell to any of its 
four neighbors on the grid. The transition function of such a machine has the form 
<5 : Q x r — >Qxrx{f, <— , j, — >}, where the arrows indicate which direction the head should 
move. 



(a) Prove that any two-dimensional Turing machine can be simulated by a standard 
Turing machine. 

(b) Suppose further that we endow our two-dimensional Turing machine with the 
following additional actions, in addition to moving the head: 

• Insert row: Move all symbols on or above the row containing the head up one 
row, leaving the head's row blank. 

• Insert column: Move all symbols on or to the right of the column containing the 
head one column to the right, leaving the head's column blank. 

• Delete row: Move all symbols above the row containing the head down one row, 
deleting the head's row of symbols. 

• Delete column: Move all symbols the right of the column containing the head 
one column to the right, deleting the head's column of symbols. 

Show that any two-dimensional Turing machine that can add an delete rows can be 
simulated by a standard Turing machine. 

8. A binary-tree Turing machine uses an infinite binary tree as its tape; that is, every cell in 
the tape has a left child and a right child. At each step, the head moves from its current 
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cell to its Parent, its Left child, or to its Right child. Thus, the transition function of such a 
machine has the form 5: QxT — >QxTx{P, L, R}. The input string is initially given along 
the left spine of the tape. 

Show that any binary-tree Turing machine can be simulated by a standard Turing 
machine. 

9. A stack-tape Turing machine uses an semi-infinite tape, where every cell is actually the 
top of an independent stack. The behavior of the machine at each iteration is governed by 
its internal state and the symbol at the top of the current cell's stack. At each transition, 
the head can optionally push a new symbol onto the stack, or pop the top symbol off the 
stack. (If a stack is empty, its "top symbol" is a blank and popping has no effect.) 

Show that any stack-tape Turing machine can be simulated by a standard Turing 
machine. (Compare with Problem 4!) 

10. A tape-stack Turing machine has two actions that modify its work tape, in addition to 
simply writing individual cells: it can save the entire tape by pushing in onto a stack, and it 
can restore the entire tape by popping it off the stack. Restoring a tape returns the content 
of every cell to its content when the tape was saved. Saving and restoring the tape do not 
change the machine's state or the position of its head. If the machine attempts to "restore" 
the tape when the stack is empty, the machine crashes. 

Show that any tape-stack Turing machine can be simulated by a standard Turing 
machine. 



• Tape alphabet = N. 

- Read: zero or positive. Write: +1, —1 

- Read: even or odd. Write: +1, —1, x2, -r2 

- Read: positive, negative, or zero. Write: x + y (merge), x — y (merge), 1, 0 

• Never three times in a row in the same direction 

• Hole-punch TM: tape alphabet {□,■}, and only □ >-> ■ transitions allowed. 



© Copyright 2014 Jeff Erickson. 
This work is licensed under a Creative Commons License (http://creativecommons.0rg/licenses/by-nc-sa/4.O/). 
Free distribution is strongly encouraged; commercial distribution is expressly forbidden. 
See http://www.cs.uiuc.edu/-jeffe/teaching/algorithms/ for the most recent revision. 
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Caveat lector: This note is not even a first draft, but more of a rough sketch, with many 
topics still to be written and/or unwritten. But the semester is over, so it's time to put it down. 
Please send bug reports and suggestions to jeffe@illinois.edu. 



Any sufficiently advanced technology is indistinguishable from magic. 

— Arthur C. Clarke, "Hazards of Prophecy: The Failure of Imagination" (1962) 

Any technology that is distinguishable from magic is insufficiently advanced. 

— Barry Gehm, quoted by Stan Schmidt in ANALOG magazine (1991) 



7 Universal Models of Computation 



*** 



Remind about the Church-Turing thesis. 

There is some confusion here between universal models of computation and the 
somewhat wider class of undecidable problems. 



7.1 Universal Turing Machines 

The pinnacle of Turing machine constructions is the universal Turing machine. For modern 
computer scientists, it's useful to think of a universal Turing machine as a "Turing machine 
interpreter written in Turing machine". Just as the input to a Python interpreter is a string of 
Python source code, the input to our universal Turing machine U is a string (M, w) that encodes 
both an arbitrary Turing machine M and a string w in the input alphabet of M. Given these 
encodings, U simulates the execution of M on input w; in particular, 

• U accepts (M, w) if and only if M accepts w. 

• U rejects (M, w) if and only if M rejects w. 

In the next few pages, I will sketch a universal Turing machine U that uses the input alphabet 
{0, 1, [,],», I } and a somewhat larger tape alphabet. However, I do not require that the Turing 
machines that U simulates have similarly small alphabets, so we first need a method to encode 
arbitrary input and tape alphabets. 



Encodings 

Let M = (T, □, S, Q, start, accept, reject, 5) be an arbitrary Turing machine, with a single half- 
infinite tape and a single read-write head. (I will consistently indicate the states and tape symbols 
of M in slanted green to distinguish them from the upright red states and tape symbols of [/.) 

We encode each symbol a e F as a unique string |a| of rig(|r|)l bits. Thus, if T = {0, 1, $,x, □}, 
we might use the following encoding: 

(0) = 001, (1) = 010, ($) = 011, (x) = 100, (n) = 000. 

The input string w is encoded by its sequence of symbol encodings, with separators • between 
every pair of symbols and with brackets [ and ] around the whole string. For example, with this 
encoding, the input string 001100 would be encoded on the input tape as 

(001100) = [0O1»0O1»010«010«001«001] 
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Similarly, we encode each state q e Q as a distinct string (q) of HglQll bits. Without loss of 
generality, we encode the start state with all Is and the reject state with all Bs. For example, if 
Q = {start, seekl, seekO, reset, verify, accept, reject}, we might use the following encoding: 

(start) = 111 (seekl) = 010 (seekO) = 011 (reset) = 100 

(verify) = 101 (accept) = 110 (reject) = 000 

We encode the machine M itself as the string (M) = [(reject) •(□) ] (5), where (5) is the 
concatenation of substrings [ (p) • (a) \ (q) • (b) • (A) ] encoding each transition 5(p, a) = (q, b, A) 
such that q 7^ reject. We encode the actions A = ±1 by defining (—1) := 0 and (+1) := 1. 
Conveniently, every transition string has exactly the same length. For example, with the symbol 
and state encodings described above, the transition 5(reset, $) = (start, $, +1) would be encoded 
as 

[10O*011|OO1*011*1]. 

Our first example Turing machine for recognizing {0 n l n 0 n \ n > 0} would be represented by 
the following string (here broken into multiple lines for readability) : 

[000*000] [[001*001 | 010*011*1] [001*100 | 101*011*1] 
[010*001 | 010*001*1] [010*100 | 010*100*1] 
[010*010 | 011*100*1] [011*010 | 011*010*1] 
[011*100 1 011* 100*1] [011*001| 100*100*1] 
[100*0011 100*001*0] [100*010| 100*010*0] 
[100*1001 100*100*0] [100*011 1 001*011*1] 
[101*100| 101*011*1] [101*000| 110*000*0] ] 

Finally, we encode any configuration of M on U's work tape by alternating between encodings 
of states and encodings of tape symbols. Thus, each tape cell is represented by the string 
[(q)*(a)] indicating that (1) the cell contains symbol a; (2) if q 7^ reject, then M's head is 
located at this cell, and M is in state q; and (3) if q = reject, then M's head is located somewhere 
else. Conveniently, each cell encoding uses exactly the same number of bits. We also surround 
the entire tape encoding with brackets [ and ] . 

For example, with the encodings described above, the initial configuration (start, 001100, 0) 
for our first example Turing machine would be encoded on U's tape as follows. 




start 0 reject 0 reject 1 reject 1 reject 0 reject 0 



Similarly the intermediate configuration (reset, $0x1x0, 3) would be encoded as follows: 




reject $ reject 0 reject x reset 1 reject x reject 0 



Input and Execution 

Without loss of generality, we assume that the input to our universal Turing machine U is given 
on a separate read-only input tape, as the encoding of an arbitrary Turing machine M followed 
by an encoding of its input string x. Notice the substrings [ [ and ] ] each appear only only once 
on the input tape, immediately before and after the encoded transition table, respectively. U also 
has a read-write work tape, which is initially blank. 
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We start by initializing the work tape with the encoding (start, x, 0) of the initial configuration 
of M with input x. First, we write [ [(start)*. Then we copy the encoded input string (x) onto 
the work tape, but we change the punctuation as follows: 

• Instead of copying the left bracket [, write [ [(start)*. 

• Instead of copying each separator •, write ] [(reject)' 

• Instead of copying the right bracket ] , write two right brackets ] ] . 

The state encodings (start) and (reject) can be copied directly from the beginning of (M) 
(replacing 0s for Is for (start)). Finally, we move the head back to the start of U's tape. 

At the start of each step of the simulation, U's head is located at the start of the work tape. 
We scan through the work tape to the unique encoded cell [(p)*(a)] such that p 7^ reject. 
Then we scan through the encoded transition function (5) to find the unique encoded tuple 
[ (p) • (a) I (q) • (b) • (A) ] whose left half matches our the encoded tape cell. If there is no such 
tuple, then U immediately halts and rejects. Otherwise, we copy the right half (q) • (b) of the 
tuple to the work tape. Now if q = accept, then U immediately halts and accepts. (We don't 
bother to encode reject transformations, so we know that q 7^ reject.) Otherwise, we transfer 
the state encoding to either the next or previous encoded cell, as indicated by M's transition 
function, and then continue with the next step of the simulation. 

During the final state-copying phase, we ever read two right brackets ] ] , indicating that 
we have reached the right end of the tape encoding, we replace the second right bracket with 
[ (reject) • (□) ] ] (mostly copied from the beginning of the machine encoding (M)) and then scan 
back to the left bracket we just wrote. This trick allows our universal machine to pretend that its 
tape contains an infinite sequence of encoded blanks [ (reject) • (□) ] instead of actual blanks □. 

Example 

As an illustrative example, suppose U is simulating our first example Turing machine M on 
the input string 001100. The execution of M on input w eventually reaches the configuration 
(seekl,$$xlxO, 3). At the start of the corresponding step in U's simulation, U is in the following 
configuration: 

{[000*011] [000*011] [000*100] [010*010] [000*100] [000*001]] 

First U scans for the first encoded tape cell whose state is not reject. That is, U repeatedly 
compares the first half of each encoded state cell on the work tape with the prefix [ (reject) • of 
the machine encoding (M) on the input tape. U finds a match in the fourth encoded cell. 

[[000*011] [000*011] [000*100] [OIO^OIO] [000*100] [000*001]] 

Next, U scans the machine encoding (M) for the substring [010*010 matching the current 
encoded cell. U eventually finds a match in the left size of the the encoded transition 
[O1O*O1O|011*100*1]. U copies the state-symbol pair 011*100 from the right half of this 
encoded transition into the current encoded cell. (The underline indicates which symbols are 
changed.) 

[[000*011] [000*011] [000*100] [0JJ L *JL0OJ [000* 100] [000*001]] 

The encoded transition instructs U to move the current state encoding one cell to the right. (The 
underline indicates which symbols are changed.) 

[[000*011] [000*011] [000*100] [000*100] [011^100] [000*001]] 
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Finally, U scans left until it reads two left brackets [ [ ; this returns the head to the left end of 
the work tape to start the next step in the simulation. [7's tape now holds the encoding of M's 
configuration (seekO, $$xxxO, 4), as required. 

{[000*011] [000*011] [000*100] [000*100] [011*100] [000*001]] 
7.2 Two-Stack Machines 

A two-stack machine is a Turing machine with two tapes with the following restricted behavior. 
At all times, on each tape, every cell to the right of the head is blank, and every cell at or to the 
left of the head is non-blank. Thus, a head can only move right by writing a non-blank symbol 
into a blank cell; symmetrically, a head can only move left by erasing the rightmost non-blank 
cell. Thus, each tape behaves like a stack. To avoid underflow, there is a special symbol at the 
start of each tape that cannot be overwritten. Initially, one tape contains the input string, with 
the head at its last symbol, and the other tape is empty (except for the start-of-tape symbol) . 



Simulate a doubly-infinite tape with two stacks, one holding the tape contents to the left of 
the head, the other holding the tape contents to the right of the head. For each transition 
of a standard Turing machine M, the stack machine pops the top symbol off the (say) left 
stack, changes its internal state according to the transition 5, and then either pushes a new 
symbol onto the right stack, or pushes a new symbol onto the left stack and then moves the 
top symbol from the right stack to the left stack. 



7.3 Counter Machines 



A configuration of a fc-counter machine consists of fc non-negative integers and an internal 
state from some finite set Q. The transition function 5 : Q x {0, +l} k ->Qx {—1, 0, +l} k takes 
an internal state and the signs of the counters as input, and produces a new internal state and 
changes to counters as output. 

• Prove that any Turing machine can be simulated by a three-counter machine. One 
counter holds the binary representation of the tape after the head; another counter 
holds the reversed binary representation of the tape before the head. Implement 
transitions via halving, doubling, and parity, using the third counter for scratch work. 

• Prove that two counters can simulate three. Store 2 a 3 b 5 c in one counter, use the other 
for scratch work. 

• Prove that a three-counter machine can compute any computable function: Given input 
(n, 0, 0), we can compute (/(n), 0, 0) for any computable function /. First transform 
(n, 0, 0) to (2", 0, 0) using all three counters; then run two- (or three-)counter TM sim- 
ulation to obtain (2 /(n) ,0,0); and finally transform (2 /w , 0,0) to (/(n),0,0) using all 
three counters. 

• HARD: Prove that a two-counter machine cannot transform (n, 0) to (2", 0). [Barzhdin 
1963, Yao 1971, Schropel 1972] 
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7.4 FRACTRAN 



*** 



FRACTRAN [Conway 1987]: A one-counter machine whose "program" is a sequence of rational 
numbers. The counter is initially 1. At each iteration, multiply the counter by the first rational 
number that yields an integer; if there is no such number, halt. 

• Prove that for any computable function / : N — » N, there is a FRACTRAN program that 
transforms 2" +1 into Z^ n ' +1 , for all natural numbers n. 

• Prove that every FRACTRAN program, given the integer 1 as input, either outputs 1 or 
loops forever. It follows that there is no FRACTRAN program for the increment function 
n >-> n + 1. 



7.5 Post Correspondence Problem 

Given n of pairs of strings (x 1 ,y 1 ), (x 2 ,y2), ■ ■ ■ , (x n , y n ), is there a finite sequence of integers 
i 2 , ■ ■ ■ , tfc) such that x^ X; • • • x ; = y t •••y^ For notation convenience, we write each pair 
vertically as [*] instead of horizontally as (x, y). For example, given the string pairs 





" 0 ' 




"01" 




"110" 


a = 


.100. 


, b = 


.00. 


, c = 


. 11 . 



we should answer True, because 



"110" 


"01" 


"110" 


" 0 " 


11 


00 


11 


100 



gives us 110110100 for both concatenations. As more extreme examples, the shortest solutions 
for the input 





" 0 ' 




"001" 




"1" 


a = 


.001. 


, b = 


1 


, c = 


0 



have length 75; one such solution is aacaacabbabccaaccaaaacbaabbaacbacbbccbbacbaccbcb 
acbbacbaccbacbbbacccbabbccbaacaacaaacbabbaacacbccbbabacbcaaccbacabbbbabcccc 
bcaababaaccbcbbbacccbabbccb. The shortest solution for the instance 





0 




0 




"01" 




"1111" 


a = 


.000. 


, b = 


.0101 


, c = 


. 1 . 


, d = 


10 



is the unbelievable a 2 b 8 a 4 c 16 ab 4 a 2 b 4 ad 4 b 3 c 8 a 6 c 8 b 2 c 4 bc 6 d 2 a 18 d 2 c 4 dcad 2 cb 54 c 3 dca 2 c lu dc 
a 6 d 28 cb 17 c 63 d 16 c 16 d 4 c 4 dc, which has total length 451. Finally, the shortest solution for the 
instance 





0 




"010" 




"100" 


a = 




, b = 




, c = 






.00010. 




. 01 . 




0 



has length 528. 



The simplest universality proof simulates a tag-Turing machine. 



7.6 Matrix Mortality 



*** 



Given a set of integer matrices A lt . . . ,A k , is the product of any sequence of these matrices 
(with repetition) equal to 0? Undecidable by reduction from PCP, even for two 15 x 15 matrices 
or six 3 x 3 matrices [Cassaigne, Halava, Harju, Nicolas 2014] 
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7.7 Dynamical Systems 



*** 



Ray Tracing [Reif, Tygar, and Yoshida 1994] The configuration of a Turing machine is encoded 
as the (x,y) coordinates of a light path crossing the unit square [0, 1] x [0, 1], where the x- 
(resp. y-)coordinate encodes the tape contents to the left (resp. right) of the head. Need 
either quadratic-surface mirrors or refraction to simulate transitions. 
N-body problem [Smith 2006]: Similar idea 

Skolem-Pisot reachability: Given an integer vector x and an integer matrix A does Ax = 
(0, . . . ) for any integer n? [Halava, Harju, Hirvensalo, Karhumaki 2005] It's surprising that this 
problem is undecidable; the similar mortality problem for one matrix is not. 



7.8 Wang Tiles 



*** 



Turing machine simulation is straightforward. Small Turing-complete tile sets via affine maps 
(via two-stack machines) are a little harder. 



7.9 Combinator Calculus 

In the 1920s, Moses Schonftnkel developed what can now be interpreted as a model of computation 
now called combinator calculus or combinatory logic. Combinator calculus operates on terms, 
where every term is either one of a finite number of combinators (represented here by upper 
case letters) or an ordered pair of terms. For notational convenience, we omit commas between 
components of every pair and parentheses around the left term in every pair. Thus, SKK(IS) is 
shorthand for the term (((S, K), K), (I, S)). 

We can "evaluate" any term by a sequence of rewriting rules that depend on its first primitive 
combinator. Schonfinkel defined three primitive combinators with the following evaluation rules: 

• Identity: Ix >-> x 

• Constant: Kxy <-* x 

• Substitution: Sxyz >-> xz(yz) 

Here, x, y, and z are variables representing unknown but arbitrary terms. "Computation" in 
the combinator calculus is performed by repeatedly evaluating arbitrary (sub)terms with one of 
these three structures, until all such (sub)terms are gone. 

For example, the term S(K(SI))Kxy (for any terms x and y) evaluates as follows: 

S(K(SI))Kx y K(SI)x (Kx)y Substitution 

—» SI(Kx)y Constant 

>-» Iy(Kxy) Substitution 

—» y( Kxy ) Identity 

>-> yx Constant 

Thus, we can define a new combinator R := S(K(SI))K that upon evaluation reverses the next 
two terms: Rxy >-» yx. 
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On the other hand, evaluating SII(S(KI)(SII)) leads to an infinite loop: 



SII(S(KI)(SII)) -» I(S(KI)(SII)) (I(S(KI)(SII))) 



Substitution 



-» S(KI)(SII)( I(S(KI)(SII)) ) 



Identity 
Identity 
Substitution 



-> S(KI)(5II)(S(KI)(SII)) 

-> KI(S(KI)(SII)) (SII(S(KI)(SII))) 



^ I(SII(S(KI)(SII))) 
-> SII(S(KI)(SII)) 



Constant 



Identity 



Wikipedia sketches a direct undecidability proof. Is there a Turing-completeness proof that 
avoids A-calculus? 



1. A tag-Turing machine has two heads: one can only read, the other can only write. Initially, 
the read head is located at the left end of the tape, and the write head is located at the 
first blank after the input string. At each transition, the read head can either move one cell 
to the right or stay put, but the write head must write a symbol to its current cell and move 
one cell to the right. Neither head can ever move to the left. 

Prove that any standard Turing machine can be simulated by a tag-Turing machine. 
That is, given any standard Turing machine M, describe a tag-Turing machine M' that 
accepts and rejects exactly the same input strings as M. 

2. *(a) Prove that any standard Turing machine can be simulated by a Turing machine with 

only three states. [Hint: Use the tope to store an encoding of the state of the machine 
yours is simulating.] 

*(b) Prove that any standard Turing machine can be simulated by a Turing machine with 
only two states. 

3. A two-dimensional Turing machine uses an infinite two-dimensional grid of cells as 
the tape; at each transition, the head can move from its current cell to any of its 
four neighbors on the grid. The transition function of such a machine has the form 
5: Q x T — > Q x T x {|, <— , j, — >}, where the arrows indicate which direction the head should 
move. 

(a) Prove that any two-dimensional Turing machine can be simulated by a standard 
Turing machine. 

(b) Suppose further that we endow our two-dimensional Turing machine with the 
following additional actions, in addition to moving the head: 



• Insert row: Move all symbols on or above the row containing the head up one 
row, leaving the head's row blank. 

• Insert column: Move all symbols on or to the right of the column containing the 
head one column to the right, leaving the head's column blank. 

• Delete row: Move all symbols above the row containing the head down one row, 
deleting the head's row of symbols. 

• Delete column: Move all symbols the right of the column containing the head 
one column to the right, deleting the head's column of symbols. 



Exercises 
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Show that any two-dimensional Turing machine that can add an delete rows can be 
simulated by a standard Turing machine. 

4. A binary-tree Turing machine uses an infinite binary tree as its tape; that is, every cell in 
the tape has a left child and a right child. At each step, the head moves from its current 
cell to its Parent, its Left child, or to its Right child. Thus, the transition function of such a 
machine has the form 5: QxT — >QxTx{P, L, R}. The input string is initially given along 
the left spine of the tape. 

Show that any binary-tree Turing machine can be simulated by a standard Turing 
machine. 

5. A stack-tape Turing machine uses an semi-infinite tape, where every cell is actually the 
top of an independent stack. The behavior of the machine at each iteration is governed by 
its internal state and the symbol at the top of the current cell's stack. At each transition, 
the head can optionally push a new symbol onto the stack, or pop the top symbol off the 
stack. (If a stack is empty, its "top symbol" is a blank and popping has no effect.) 

Show that any stack-tape Turing machine can be simulated by a standard Turing 
machine. (Compare with Problem ??!) 

6. A tape-stack Turing machine has two actions that modify its work tape, in addition to 
simply writing individual cells: it can save the entire tape by pushing in onto a stack, and it 
can restore the entire tape by popping it off the stack. Restoring a tape returns the content 
of every cell to its content when the tape was saved. Saving and restoring the tape do not 
change the machine's state or the position of its head. If the machine attempts to "restore" 
the tape when the stack is empty, the machine crashes. 

Show that any tape-stack Turing machine can be simulated by a standard Turing 
machine. 



*** 



• Tape alphabet = N. 




- Read: zero or positive. Write: +1, —1 




- Read: even or odd. Write: +1, —1, x2, -r2 




- Read: positive, negative, or zero. Write: x + y 


(merge), x — y (merge), 1, 0 


• Never three times in a row in the same direction 




• Hole-punch TM: tape alphabet {□,■}, and only □ >-» 


■ transitions allowed. 
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Caveat lector: This is the zeroth (draft) edition of this lecture note. Please send bug reports 
and suggestions to jeffe@illinois.edu. 



/ said in my haste, All men are liars. 

— Psalms 116:11 (King James Version) 

yields falsehood when preceded by its quotation. 

— William V. Quine, "Paradox", Scientific American (1962) 

Some problems are so complex that you have to be highly intelligent and well informed 
just to be undecided about them. 

— Laurence Johnston Peter, Peter's Almanac (September 24, 1982) 

"Proving or disproving a formula — once you've encrypted the formula into numbers, 
that is — Is just a calculation on that number. So it means that the answer to the question 
is, no! Some formulas cannot be proved or disproved by any mechanical process! So I 
guess there's some point in being human after all!" 

Alan looked pleased until Lawrence said this last thing, and then his face collapsed. 
"Now there you go making unwarranted assumptions. " 

— Neal Stephenson, Cryptonomicon (1999) 

Wo matter how P might perform, Q will scoop it: 
Q uses P 's output to make P look stupid. 
Whatever P says, it cannot predict Q: 
P is right when it's wrong, and is false when it's true! 

— Geoffrey S. Pullum, "Scooping the Loop Sniffer" (2000) 

This castle is in unacceptable condition! UNACCEPTABLE! ! 

— Earl of Lemongrab [Justin Roiland], "Too Young" 
Adventure Time (August 8, 2011) 



8 Undecidability 

Perhaps the single most important result in Turing's remarkable 1936 paper is his solution to 
Hilbert's Entscheidungsproblem, which asked for a general automatic procedure to determine 
whether a given statement of first-order logic is provable. Turing proved that no such procedure 
exists; there is no systematic way to distinguish between statements that cannot be proved even 
in principle and statements whose proofs we just haven't found yet. 

8.1 Acceptable versus Decidable 

Recall that there are three possible outcomes for a Turing machine M running on any particular 
input string w: acceptance, rejection, and divergence. Every Turing machine M immediately 
defines four different languages (over the input alphabet E of M) : 

• The accepting language Accept(M) := {w e E* | M accepts w} 

• The rejecting language Reject(M) := {w e E* | M rejects w) 

• The halting language Halt(M) := Accept(M) U Reject(M) 

• The diverging language Diverge(M) := E* \ Halt(M) 
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For any language L, the sentence "M accepts L" means Accept(M) = L, and the sentence "M 
decides L" means Accept(M) = L and Diverge(M) = 0. 

Now let L be an arbitrary language. We say that L is acceptable (or semi-computable, or 
semi-decidable, or recognizable, or listable, or recursively enumerable) if some Turing machine 
accepts L, and unacceptable otherwise. Similarly, L is decidable (or computable, or recursive) if 
some Turing machine decides L, and undecidable otherwise. 

8.2 Lo, I Have Become Death, Stealer of Pie 

There is a subtlety in the definitions of "acceptable" and "decidable" that many beginners miss: 
A language can be decidable even if we can't exhibit a specific Turing machine decides it. As a 
canonical example, consider the language n = {w \ V w ' appears in the binary expansion of n}. 
Despite appearances, this language is decidable! There are only two cases to consider: 

• Suppose there is an integer N such that the binary expansion of n contains the substring 
l w but does not contain the substring l w+1 . Let M N be the Turing machine with N + 3 
states {0, 1,...,N, accept, reject}, start state 0, and the following transition function: 



This machine correctly decides n. 

• Suppose the binary expansion of n contains arbitrarily long substrings of Is. Then any 
Turing machine that accepts all inputs correctly decides n. 

We have no idea which of these machines correctly decides n, but one of them does, and that's 
enough! 

8.3 Useful Lemmas 

This subsection lists several simple but useful properties of (un) decidable and (un) acceptable 
languages. For almost all of these properties, the proofs are straightforward; readers are strongly 
encouraged to try to prove each lemma themselves before reading ahead. 

One might reasonably ask why we don't also define "rejectable" and "haltable" languages. 
The following lemma, whose proof is an easy exercise (hint, hint), implies that these are both 
identical to the acceptable languages. 

Lemma 1. Let M be an arbitrary Turing machine. 

(a) There is a Turing machine M R such that Accept(M r ) = Reject(M) and Reject(M r ) = 
Accept{M). 

(b) There is a Turing machine M A such that Accept(M a ) = Accept(M) and Reject(M a ) = 0. 

(c) There is a Turing machine M H such that Accept(M h ) = Halt(M) and Reject(M h ) = 0. 

The decidable languages have several fairly obvious useful properties. 
Lemma 2. If L and L' are decidable, then L U L' , L n L' , L \ L 1 , and L'\L are also decidable. 




reject if a ^ □ and q = n 

(q + l,a,+l) otherwise 



if a = □ 
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Proof: Let M and M' be Turing machines that decide L and L' ', respectively. We can build a 
Turing machine M u that decides L U L 1 as follows. First, M u copies its input string w onto a 
second tape. Then M u runs M on input w (on the first tape), and then runs M 1 on input w (on 
the second tape). If either M or M' accepts, then M u accepts; if both M and M 1 reject, then M u 
rejects. 

The other three languages are similar. □ 

Corollary 3. The following hold for all languages L and L ' . 

(a) If LC\L' is undecidable and h' is decidable, then L is undecidable. 

(b) If LUL' is undecidable and L 1 is decidable, then L is undecidable. 

(c) If L\ L' is undecidable and h' is decidable, then L is undecidable. 

(d) If L' \L is undecidable and L 1 is decidable, then L is undecidable. 

The asymmetry between acceptance and rejection implies that merely acceptable languages 
are not quite as well-behaved as decidable languages. 

Lemma 4. For all acceptable languages L and L 1 , the languages LUL' and LC\L' are also acceptable. 

Proof: Let M and M' be Turing machines that decide L and L' ', respectively. We can build a 
Turing machine M n that decides L n L 1 as follows. First, M n copies its input string w onto a 
second tape. Then M n runs M on input w using the first tape, and then runs M' on input w 
using the second tape. If both M and M' accept, then M n accepts; if either M or M' reject, then 
M n rejects; if either M or M' diverge, then M n diverges (automatically). 

The construction for L U L 1 is more subtle; instead of running M and M 1 in series, we must 
run them in parallel. Like M n , the new machine M u starts by copying its input string w onto 
a second tape. But then M u runs M and M' simultaneously; with each step of M u simulating 
both one step of M on the first tape and one step of M' on the second. Ignoring the states and 
transitions needed for initialization, the state set of M u is the product of the state sets of M and 
M', and the transition function is 



Thus, My accepts as soon as either M or M' accepts, and rejects only after both M or M' 



Lemma 5. An acceptable language L is decidable if and only if S* \ L is also acceptable. 

Proof: Let M and M be Turing machines that accept L and E* \ L, respectively. Following the 
previous proof, we construct a new Turing machine M* that copies its input onto a second tape, 
and then simulates M and M' in parallel on the two tapes. If M accepts, then M* accepts; if M 
accepts, then M* rejects. Since every string is accepted by either M or M, we conclude that M* 
decides L. 

The other direction follows immediately from Lemma 1. □ 





if q = accept or q' = accept' 
if q = reject and q' = reject' 



reject. 



□ 
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8.4 Self-Haters Gonna Self-Hate 

Let U be an arbitrary fixed universal Turing machine. Any Turing machine M can be encoded 
as a string (M) of symbols from U's input alphabet, so that U can simulate the execution of 
M on any suitably encoded input string. Different universal Turing machines require different 
encodings. 1 

A Turing machine encoding is just a string, and any string (over the correct alphabet) can 
be used as the input to a Turing machine. Thus, we can use the encoding (M) of any Turing 
machine M as the input to another Turing machine. We've already seen an example of this ability 
in our universal Turing machine U, but more significantly, we can use (M) as the input to the 
same Turing machine M. Thus, each of the following languages is well-defined: 



SelfAccept := {(M) | M accepts (M)} 
SelfReject := {(M) | M rejects (M)} 
SelfHalt := {(M) I M halts on (M)} 
SelfDiverge := {(M) | M diverges on (M)} 



One of Turing's key observations is that SelfReject is undecidable; Turing proved this theorem 
by contradiction as follows: 

Suppose to the contrary that there is a Turing machine SR such that Accept(SR) = 
SelfReject and Diverge(SR) = 0. More explicitly, for any Turing machine M, 

• SR accepts (M> <=> M rejects (M), and 

• SR rejects (M) <=> M does not reject (M). 

In particular, these equivalences must hold when M is equal to SR. Thus, 

• SR accepts (SR) <=> SR rejects (SR), and 

• SR rejects (SR) <=> SR does not reject (SR). 

In short, SR accepts (SR) if and only if SR rejects (SR), which is impossible! The only logical 
conclusion is that the Turing machine SR does not exist! 



8.5 Aside: Uncountable Barbers 

Turing's proof by contradiction is nearly identical to the famous diagonalization argument that 
uncountable sets exist, published by Georg Cantor in 1891. Indeed, SelfReject is sometimes 
called "the diagonal language". Recall that a function / : A — > B is a surjection 2 if /(A) = {/(a) | 
a <eA} = B. 

Cantor's Theorem. Let f : X — > 2 X be an arbitrary function from an arbitrary set X to its power 
set. This function f is not a surjection. 

1 In fact, these undecidability proofs never actually use the universal Turing machine; all we really need is an 
encoding function that associates a unique string (M) with every Turing machine M. However, we do need the 
encoding to be compatible with a universal Turing machine for the results in Section ??. 

2 more commonly, flouting all reasonable standards of grammatical English, "an onto function" 
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Proof: Fix an arbitrary function / : X — > 2 X . Call an element x e X happy if x e /(x) and sad 
if x ^/(x). Let Y be the set of all sad elements ofX; that is, for every element x GX, we have 

x<EY <=> x £f{x). 

For the sake of argument, suppose / is a surjection. Then (by definition of surjection) there must 
be an element y eX such that / (y) = Y. Then for every element ieX,we have 

xe/(y) <=> x£/(x). 

In particular, the previous equivalence must hold when x = y: 

y€/(y) <^> y£/(y). 

We have a contradiction! We conclude that / is not a surjection after all. □ 

Now letX = E*, and define the function / : X — > 2 X as follows: 

(Accept(M) if w = (M) for some Turing machine M 
0 if w is not the encoding of a Turing machine 

Cantor's theorem immediately implies that not all languages are acceptable. 

Alternatively, let X be the set of all Turing machines that halt on all inputs. For any Turing 
machine M e X, let / (M) be the set of all Turing machines N e X such that M accepts the 
encoding (N). Then a Turing machine M is sad if it rejects its own encoding (M); thus, Y is 
essentially the set SelfReject. Cantor's argument now immediately implies that no Turing 
machine decides the language SelfReject. 

The core of Cantor's diagonalization argument also appears in the "barber paradox" popular- 
ized by Bertrand Russell in the 1910s. In a certain small town, every resident has a haircut on 
Haircut Day. Some residents cut their own hair; others have their hair cut by another resident of 
the same town. To obtain an official barber's license, a resident must cut the hair of all residents 
who don't cut their own hair, and no one else. Given these assumptions, we can immediately 
conclude that there are no licensed barbers. After all, who would cut the barber's hair? 

To map Russell's barber paradox back to Cantor's theorem, let X be the set of residents, and 
let _f (x) be the set of residents who have their hair cut by x; then a resident is sad if they do not 
cut their own hair. To prove that SelfReject is undecidable, replace "resident" with "a Turing 
machine that halts on all inputs", and replace "A cuts B's hair" with "A accepts (B)". 



8.6 Just Don't Know What to Do with Myself 

Similar diagonal arguments imply that the other three languages Self Accept, SelfHalt, and 
Self-Diverge are also undecidable. The proofs are not quite as direct for these three languages 
as the proof for SelfReject; each fictional deciding machine requires a small modification to 
create the contradiction. 

Theorem 6. SelfAccept is undecidable. 

Proof: For the sake of argument, suppose there is a Turing machine SA such that Accept(SA) = 
SelfAccept and Diverge(M) = 0. Let SA R be the Turing machine obtained from SA by 
swapping its accept and reject states (as in the proof of Lemma 1). Then Reject(SA^) = 
SelfAccept and Diverge(SA r ) = 0. It follows that SA R rejects (SA R ) if and only if SA R accepts 
(SA R ), which is impossible. □ 
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Theorem 7. SelfHalt is undecidable. 

Proof: Suppose to the contrary that there is a Turing machine SH such that Accept(SH) = 
SelfHalt and Diverge(SH) = 0. Let SH X be the Turing machine obtained from SH by 
redirecting every transition to accept to a new hanging state hang, and then redirecting every 
transition to reject to accept. Then AccEPTfSH^) = S* \ SelfHalt and Reject(SH x ) = 0. 
It follows that SH X accepts (SH X ) if and only if SH X does not halt on (SH X ), and we have a 
contradiction. □ 

Theorem 8. SelfDiverge is unacceptable and therefore undecidable. 

Proof: Suppose to the contrary that there is a Turing machine SD such that Accept(M) = 
SelfDiverge. Let SD A be the Turing machine obtained from M by redirecting every transition 
to reject to a new hanging state hang such that <5(hang, a) = (hang, a, +1) for every symbol a. 
Then Accept(SD a ) = SelfDiverge and Reject(SD a ) = 0. It follows that SD A accepts (SD*) if 
and only if SD A does not halt on (SD A ), which is impossible. □ 

*8.7 Nevertheless, Acceptable 

Our undecidability argument for SelfDiverge actually implies the stronger result that Self- 
Diverge is unacceptable; we never assumed that the hypothetical accepting machine SD halts 
on all inputs. However, we can use or modify our universal Turing machine to accept the other 
three languages. 

Theorem 9. SelfAccept is acceptable. 

Proof: We describe a Turing machine SA that accepts the language SelfAccept. Given any 
string w as input, SA first verifies that w is the encoding of a Turing machine. If w is not the 
encoding of a Turing machine, then SA diverges. Otherwise, w = (M) for some Turing machine 
M; in this case, SA writes the string ww = (M)(M) onto its tape and passes control to the 
universal Turing machine U. U then simulates M (the machine encoded by the first half of its 
input) on the string (M) (the second half of its input). 3 In particular, U accepts (M,M) if and 
only if M accepts (M). We conclude that SR accepts (M) if and only if M accepts (M). □ 

Theorem 10. SelfReject is acceptable. 

Proof: Let U R be the Turing machine obtained from our universal machine U by swapping the 
accept and reject states. We describe a Turing machine SR that accepts the language SelfReject 
as follows. SR first verifies that its input string w is the encoding of a Turing machine and 
diverges if not. Otherwise, SR writes the string ww = (M, M) onto its tape and passes control to 
the reversed universal Turing machine U R . Then U R accepts (M, M) if and only if M rejects (M) . 
We conclude that SR accepts (M) if and only if M rejects (M). □ 

Finally, because SelfHalt is the union of two acceptable languages, SelfHalt is also 
acceptable. 

3 To simplify the presentation, I am implicitly assuming here that (M) = ((M)). Without this assumption, we need 
a Turing machine that transforms an arbitrary string w e T,* M into its encoding (w) for U; building such a Turing 
machine is straightforward. 
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8.8 The Halting Problem via Reduction 



Consider the following related languages: 



4 



Accept 



Reject 



Halt 



{ (M, w) I M accepts w} 
{ (M, w) I M rejects w} 
{ (M, w) I M halts on w} 



Diverge 



{ (M, w) I M diverges on w 



} 



Deciding the language Halt is what is usually meant by the halting problem: Given a program 
M and an input w to that program, does the program halt? This problem may seem trivial; why 
not just run the program and see? More formally, why not just pass the input string (M,x) to 
our universal Turing machine U? That strategy works perfectly if we just want to accept Halt, 
but we actually want to decide Halt; if M is not going to halt on w, we still want an answer in a 
finite amount of time. Sadly, we can't always get what we want. 

Theorem 11. Halt is undecidable. 

Proof: Suppose to the contrary that there is a Turing machine H that decides Halt. Then we 
can use H to build another Turing machine SH that decides the language SelfHalt. Given any 
string w, the machine SH first verifies that w = (M) for some Turing machine M (rejecting if 
not), then writes the string ww = (M,M) onto the tape, and finally passes control to H. But 
SelfHalt is undecidable, so no such machine SH exists. We conclude that H does not exist 
either. □ 

Nearly identical arguments imply that the languages Accept, Reject, and Diverge are 
undecidable. 

Here we have our first example of an undecidability proof by reduction. Specifically, we 
reduced the language SelfHalt to the language Halt. More generally, to reduce one language 
X to another language Y, we assume (for the sake of argument) that there is a program P Y that 
decides Y, and we write another program that decides X, using P Y as a black-box subroutine. 
If later we discover that Y is decidable, we can immediately conclude that X is decidable. 
Equivalently if we later discover that X is undecidable, we can immediately conclude that Y is 
undecidable. 



Perhaps the most confusing aspect of reduction arguments is that the languages we want to 
prove undecidable nearly (but not quite) always involve encodings of Turing machines, while at 
the same time, the programs that we build to prove them undecidable are also Turing machines. 
Our proof that Halt is undecidable involved three different machines: 

• The hypothetical Turing machine H that decides Halt. 

• The new Turing machine SH that decides SelfHalt, using H as a subroutine. 

4 Sipser uses the shorter name A TM instead of Accept, but uses HALT TM instead of Halt. I have no idea why he 
thought four-letter names are okay, but six-letter names are not. His subscript TM is just a reminder that these are 
languages of Turing machine encodings, as opposed to encodings of DFAs or some other machine model. 



To prove that a language I is undecidable, 
reduce a known undecidable language to I. 
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• The Turing machine M whose encoding is the input to H. 

It is incredibly easy to get confused about which machines are playing each in the proof. Therefore, 
it is absolutely vital that we give each machine in a reduction proof a unique and mnemonic 
name, and then always refer to each machine by name. Never write, say, or even think "the 
machine" or "that machine" or (gods forbid) "it". You also may find it useful to think of the 
working programs we are trying to construct (H and SH in this proof) as being written in a 
different language than the arbitrary source code that we want those programs to analyze ((M) 
in this proof). 

8.9 One Million Years Dungeon! 

As a more complex set of examples, consider the following languages: 

NeverAccept := {(M) | Accept(M) = 0} 
NeverReject := {(M) | Reject(M) = 0} 
NeverHalt := {(M) | Halt(M) = 0} 
NeverDiverge := {(M) | Diverge(M) = 0} 

Theorem 12. NeverAccept is undecidable. 

Proof: Suppose to the contrary that there is a Turing machine NA that decides NeverAccept. 
Then by swapping the accept and reject states, we obtain a Turing machine NA R that decides 
the complementary language S* \ NeverAccept. 

To reach a contradiction, we construct a Turing machine A that decides Accept as follows. 
Given the encoding (M, w) of an arbitrary machine M and an arbitrary string w as input, A writes 
the encoding (M w ) of a new Turing machine M w that ignores its input, writes w onto the tape, 
and then passes control to M. Finally, A passes the new encoding (M w ) as input to NA R . The 
following cartoon tries to illustrate the overall construction. 




<M,w> 



1 — s 

Build 


<M W > 


NA" 


NA 








accept 
reject 



A reduction from from Accept to NeverAccept, which proves NeverAccept undecidable. 

Before going any further, it may be helpful to list the various Turing machines that appear in 
this construction. 
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• The hypothetical Turing machine NA that decides NeverAccept. 

• The Turing machine NA R that decides S* \ NeverAccept, which we constructed by 
modifying NA. 

• The Turing machine A that we are building, which decides Accept using NA R as a black-box 
subroutine. 

• The Turing machine M, whose encoding is part of the input to A. 

• The Turing machine M w whose encoding A constructs from (M,w) and then passes to NA R 
as input. 

Now let M be an arbitrary Turing machine and w be an arbitrary string, and suppose we run 
our new Turing machine A on the encoding (M, w). To complete the proof, we need to consider 
two cases: Either M accepts w or M does not accept w. 

• First, suppose M accepts w. 

- Then for all strings x, the machine M w accepts x. 

- So Accept(M w ) = E*, by the definition of Accept(M w ). 

- So (M w ) NeverAccept, by definition of NeverAccept. 

- So NA rejects {M w ), because NA decides NeverAccept. 

- So NA R accepts (M w ), buy construction of NA R . 

- We conclude that A accepts (M, w), by construction of A. 

• On the other hand, suppose M does not accept w, either rejecting or diverging instead. 

- Then for all strings x, the machine M w does not accept x. 

- So Accept(M w ) = 0, by the definition of Accept(M w ). 

- So (M w ) e NeverAccept, by definition of NeverAccept. 

- So NA accepts (M w ), because NA decides NeverAccept. 

- So NA R rejects (M w ), buy construction of NA R . 

- We conclude that A rejects (M, w), by construction of A. 

In short, A decides the language Accept, which is impossible. We conclude that NA does not 
exist. □ 

Again, similar arguments imply that the languages NeverReject, NeverHalt, and Never- 
Diverge are undecidable. In each case, the core of the argument is describing how to transform 
the incoming machine-and-input encoding (M,w) into the encoding of an appropriate new 
Turing machine (M w ). 

Now that we know that NeverAccept and its relatives are undecidable, we can use them as 
the basis of further reduction proofs. Here is a typical example: 

Theorem 13. The language DwergeSame := {(Mj) (M 2 ) | Diverge(M{) = Diyerge(M 2 )} is unde- 
cidable. 

Proof: Suppose for the sake of argument that there is a Turing machine DS that decides 
DivergeSame. Then we can build a Turing machine ND that decides NeverDiverge as follows. 
Fix a Turing machine Y that accepts S* (for example, by defining 5(start, a) = (accept, •, •) for 
all a e T). Given an arbitrary Turing machine encoding (M) as input, ND writes the string 
(M)(Y) onto the tape and then passes control to DS. There are two cases to consider: 
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• If DS accepts (M){Y), then Diverge(M) = Diverge(Y) = 0, so (M) e NeverDiverge. 

• If DS rejects (M)(Y), then Diverge(M) 7^ Diverge(Y) = 0, so (M) ^ NeverDiverge. 

In short, ND accepts (M) if and only if (M) e NeverDiverge, which is impossible. We conclude 
that DS does not exist. □ 



8.10 Rice's Theorem 

In 1953, Henry Rice proved the following extremely powerful theorem, which essentially states 
that every interesting question about the language accepted by a Turing machine is undecidable. 

Rice's Theorem. Let L be any set of languages that satisfies the following conditions: 

• There is a Turing machine Y such that Accept(Y) e 

• There is a Turing machine N such that Accept(N) ^ L. 

The language AcceptIn(L) := {(M) | Accept(M) e £} is undecidable. 

Proof: Without loss of generality, suppose 0 ^ L . (A symmetric argument establishes the theorem 
in the opposite case 0 e £.) Fix an arbitrary Turing machine Y such that Accept(Y) e L. 

Suppose to the contrary that there is a Turing machine A £ that decides AcceptLn(£). To 
derive a contradiction, we describe a Turing machine H that decides the halting language Halt, 
using A L as a black-box subroutine. Given the encoding (M, w) of an arbitrary Turing machine 
M and an arbitrary string w as input, H writes the encoding (WTF) of a new Turing machine 
WTF that executes the following algorithm: 

WTF(x): 

run M on input w (and discard the result) 
run Y on input x 

H then passes the new encoding (WTF) to A £ . 

Now let M be an arbitrary Turing machine and w be an arbitrary string, and suppose we run 
our new Turing machine H on the encoding (M, w). There are two cases to consider. 

• Suppose M halts on input w. 

- Then for all strings x, the machine WTF accepts x if and only if Y accepts x. 

- So Accept(WTF) = Accept(Y), by definition of Accept( • ). 

- So Accept(WTF) e L, by definition of Y. 

- So A £ accepts (WTF), because A £ decides AcceptIn(£). 

- So H accepts (M, w), by definition of H. 

• Suppose M does not halt on input w. 

- Then for all strings x, the machine WTF does not halt on input x, and therefore does 
not accept x. 

- So Accept(WTF) = 0, by definition of Accept(WTF). 

- So Accept( WTF) ^ L, by our assumption that 0 ^ £. 

- SoA £ rejects (WTF), because A £ decides AcceptIn(£). 

- So H rejects (M, w), by definition of H. 
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In short, H decides the language Halt, which is impossible. We conclude that A £ does not 
exist. □ 

The set L in the statement of Rice's Theorem is often called a property of languages, rather 
than a set, to avoid the inevitable confusion about sets of sets. We can also think of £ as a 
decision problem about languages, where the languages are represented by Turing machines 
that accept or decide them. Rice's theorem states that the only properties of languages that are 
decidable are the trivial properties "Does this Turing machine accept an acceptable language?" 
(Answer: Yes, by definition.) and "Does this Turing machine accept Discover?" (Answer: No, 
because Discover is a credit card, not a language.) 

Rice's Theorem makes it incredibly easy to prove that language properties are undecidable; 
we only need to exhibit one acceptable language that has the property and another acceptable 
language that does not. In fact, most proofs using Rice's theorem can use at least one of the 
following Turing machines: 

• M AccEPT accepts every string, by defining <5(start, a) = accept for every tape symbol a. 

• M Reject rejects every string, by defining <5(start, a) = reject for every tape symbol a. 

• M D j VERGE diverges on every string, by defining 5(start, a) = (start, a, +1) for every tape 
symbol a. 



Corollary 14. Each of the following languages is undecidable. 
M accepts given an empty initial tape} 
M accepts the string UIUC} 
M accepts exactly three strings} 
M accepts all palindromes} 
Accept(M) is regular} 
Accept(M) is not regular} 
Accept(M) is undecidable} 

Accept(M) = Accept(N)}, for some arbitrary fixed Turing machine N. 



Proof: In all cases, undecidability follows from Rice's theorem. 



(a) Let L be the set of all languages that contain the empty string. Then AcceptLn(£) = {(M) | 
M accepts given an empty initial tape}. 

• Given an empty initial tape, M AccEPT accepts, so Halt(M Accept ) € L . 

• Given an empty initial tape, M DlVERGE does not accept, so Halt(M Diverge ) ^ L. 

Therefore, Rice's Theorem implies that AcceptIn(£) is undecidable. 

(b) Let L be the set of all languages that contain the string UIUC. 

• M AccEPT accepts UIUC, so Halt(M Accept ) e L. 

• M DlVERGE does not accept UIUC, so Halt(M Diverge ) ^ £. 

Therefore, AcceptIn(£) = {(M) | M accepts the string UIUC} is undecidable by Rice's The- 
orem. 

(c) There is a Turing machine that accepts the language {larry, curly, moe}. On the other 
hand, M^j^ does not accept exactly three strings. 
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(d) M AccEPT accepts all palindromes, and M Reject does not accept all palindromes. 

(e) M Reject accepts the regular language 0, and there is a Turing machine M 0 n in that accepts 
the non-regular language {G"l n | n > 0}. 

(f) M Reject accepts the regular language 0, and there is a Turing machine M G n n n that accepts 
the non-regular language {G"l n | n > 0}. 5 

(g) M Reject accepts the decidable language 0, and there is a Turing machine that accepts the 
undecidable language SelfReject. 

(h) The Turing machine N accepts Accept(IV) by definition. The Turing machine N R , obtained 
by swapping the accept and reject states of IV, accepts the language Halt(L)\ Accept(IV) 7^ 
Accept(IV). □ 

We can also use Rice's theorem as a component in more complex undecidability proofs, where 
the target language consists of more than just a single Turing machine encoding. 

Theorem 15. The language L := {(M, w) | M accepts w k for every integer k > o} is undecidable. 

Proof: Fix an arbitrary string w, and let L be the set of all languages that contain w k for all k. 
Then Accept(M Accept ) = S* e L and AccEPT(Af REJECT ) = 0 ^ £. Thus, even if the string w is 
fixed in advance, no Turing machine can decide L. □ 

Nearly identical reduction arguments imply the following variants of Rice's theorem. (The 
names of these theorems are not standard.) 

Rice's Rejection Theorem. Let L be any set of languages that satisfies the following conditions: 

• There is a Turing machine Y such that Reject(Y) e L 

• There is a Turing machine N such that Reject(N) ^ L. 

The language RejectIn(L) := {(M) | Reject(M) e £} is undecidable. 

Rice's Halting Theorem. Let L be any set of languages that satisfies the following conditions: 

• There is a Turing machine Y such that Halt(Y) € L 

• There is a Turing machine N such that Halt(N) ^ L. 

The language HaltIn(L) := {(M) | Halt(M) e £} is undecidable. 

Rice's Divergence Theorem. Let L be any set of languages that satisfies the following conditions: 

• There is a Turing machine Y such that Diverge(Y) e L 

• There is a Turing machine N such that Diverge(N) ^ L. 

The language DivergeIn(L) := {(M) | Diverge(M) € £} is undecidable. 

Rice's Decision Theorem. Let L be any set of languages that satisfies the following conditions: 

• There is a Turing machine Y such that decides an language in L. 

• There is a Turing machine N such that decides an language not in L. 

The language DecideIn(L) := {(M) | M decides a language in is undecidable. 



5 Yes, parts (e) and (f) have exactly the same proof. 
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As a final sanity check, always be careful to distinguish the following objects: 

• The string e 

• The language 0 

• The language {e} 

• The language property 0 

• The language property {0} 

• The language property {{e}} 

• The Turing machine that rejects every string and therefore decides the language 0. 

• The Turing machine M DlVERGE that diverges on every string and therefore accepts the 
language 0. 

*8.n The Rice-McNaughton-Myhill-Shapiro Theorem 

The following subtle generalization of Rice's theorem precisely characterizes which properties 
of acceptable languages are acceptable. This result was partially proved by Henry Rice in 1953, 
in the same paper that proved Rice's Theorem; Robert McNaughton, John Myhill, and Norman 
Shapiro completed the proof a few years later, each independently from the other two. 6 

The Rice-McNaughton-Myhill-Shapiro Theorem. Let L be an arbitrary set of acceptable lan- 
guages. The language AcceptIn(L) := {(M) | Accept(M) s L} is acceptable if and only if L 
satisfies the following conditions: 

(a) L is monotone: For any language L e L, every superset of L is also in L. 

(b) L is compact: Every language in L has a finite subset that is also in L. 

(c) L is finitely acceptable: The language {(L) | L e £ and L is finite} is acceptable. 7 

I won't give a complete proof of this theorem (in part because it requires techniques I haven't 
introduced), but the following lemma is arguably the most interesting component: 

Lemma 16. Let L be a set of acceptable languages. If L is not monotone, then AcceptIn(L) is 
unacceptable. 

Proof: Suppose to the contrary that there is a Turing machine Af £ that accepts AcceptIn(£). 
Using this Turing machine as a black box, we describe a Turing machine SD that accepts the 
unacceptable language SelfDiverge. Fix two Turing machines Y and N such that 

AccEPT(y) e L, 
Accept(IV) ^ £, 
and AccEPT(y) c Accept(IV). 

Let w be the input to SD. After verifying that w = (M) for some Turing machine M 
(and rejecting otherwise), SD writes the encoding (WTF) or a new Turing machine WTF that 
implements the following algorithm: 

6 McNaughton never published his proof (although he did announce the result); consequently, this theorem is 
sometimes called "The Rice-Myhill-Shapiro Theorem". Even more confusingly, Myhill published his proof twice, once 
in a paper with John Shepherdson and again in a later paper with Jacob Dekker. So maybe it should be called the 
Rice-Dekker-Myhill-McNaughton-Myhill-Shepherdson-Shapiro Theorem. 

7 Here the encoding (L) of a finite language L c £* is exactly the string that you would write down to explicitly 
describe L. Formally, (L) is the unique string over the alphabet £ U {{, ,,},£} that contains the strings in L in 
lexicographic order, separated by commas , and surrounded by braces {}, with £ representing the empty string. For 
example, ({e, 0,01,0110,01101001}) = {£,0,01,0110,01101001}. 
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WTF(x): 

write x to second tape 

write (M) to third tape 

in parallel: 

run Y on the first tape 
run JV on the second tape 
run M on the third tape 

if Y accepts x 
accept 

if N accepts x and M halts on (M) 
accept 



Finally, SD passes the new encoding (WTF) to Af £ . There are two cases to consider: 

• If M halts on (M), then Accept(WTF) = Accept(IV) ^ L, and therefore AI L does not 
accept (WTF). 

• If M does not halt on (M), then Accept(WTF) = Accept(Y) e L, and therefore AI L 
accepts (WTF). 

In short, SD accepts SelfDiverge, which is impossible. We conclude that SD does not exist. □ 
Corollary 17. Each of the following languages is unacceptable. 



(a) {(M 

(b) {(M 

(c) {(M 

(d) {(M 

(e) {(M 

(f) {(M 

(g) {(M 
00 {(M 



Accept(M) is finite} 

Accept(M) is infinite} 

Accept(M) is regular} 

Accept(M) is not regular} 

Accept(M) is decidable} 

Accept(M) is undecidable} 

M accepts at least one string in SelfDiverge} 

Accept(M) = Accept(N)}, for some arbitrary fixed Turing machine N. 



Proof: (a) The set of finite languages is not monotone: 0 is finite; S* is not finite; both 0 and 
S* are acceptable (in fact decidable); and 0 c E*. 

(b) The set of infinite acceptable languages is not compact: No finite subset of the infinite 
acceptable language S* is infinite! 

(c) The set of regular languages is not monotone: Consider the languages 0 and {0"l n | n > 0}. 

(d) The set of non-regular acceptable languages is not monotone: Consider the languages 
{0 n l" I n> 0} and £*. 

(e) The set of decidable languages is not monotone: Consider the languages 0 and SelfReject. 

(f) The set of undecidable acceptable languages is not monotone: Consider the languages 
SelfReject and S*. 

(g) The set L = {L | L n SelfDiverge 7^ 0} is not finitely acceptable. For any string w, deciding 
whether {w} e L is equivalent to deciding whether w e SelfDiverge, which is impossible. 

(h) If Accept(IV) 7^ E*, then the set {Accept(IV)} is not monotone. On the other hand, if 
Accept(IV) = E*, then the set {Accept(IV)} is not compact: No finite subset of E* is equal 
to E*! □ 
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8.12 Turing Machine Behavior: It's Complicated 

Rice's theorems imply that every interesting question about the language that a Turing machine 
accepts — or more generally, the function that a program computes — is undecidable. A more subtle 
question is whether we can recognize Turing machines that exhibit certain internal behavior. 
Some behaviors we can recognize; others we can't. 

Theorem 18. The language NeverLeft := {(M,w) | Given w as input, M never moves left} is de- 
cidable. 

Proof: Given the encoding (M,w), we simulate M with input w using our universal Turing 
machine U, but with the following termination conditions. If M ever moves its head to the left, 
then we reject. If M halts without moving its head to the left, then we accept. Finally, if M reads 
more than |Q| blanks, where Q is the state set of M, then we accept. If the first two cases do not 
apply, M only moves to the right; moreover, after reading the entire input string, M only reads 
blanks. Thus, after reading |Q| blanks, it must repeat some state, and therefore loop forever 
without moving to the left. The three cases are exhaustive. □ 

Theorem 19. The language LeftThree := {(M,w) | Given w as input, M eventually moves left 
three times in a row} is undecidable. 

Proof: Given (M), we build a new Turing machine M' that accepts the same language as M and 
moves left three times in a row if and only if it accepts, as follows. For each non-accepting state 
p of M, the new machine M' has three states Pi,P2,P3, with the following transitions: 

5'(p 1 , a) = (q 2 , b, A), where (q, b, A) = <5(p, a) and q ^ accept 

5 / (p 2 ,a) = (p 3 ,a,+l) 
5 / (p 3 , a ) = (Pi. a ,-l) 

In other words, after each non-accepting transition, M' moves once to the right and then once to 
the left. For each transition to accept, M' has a sequence of seven transitions: three steps to the 
right, then three steps to the left, and then finally accept', all without modifying the tape. (The 
three steps to the right ensure that M' does not fall off the left end of the tape.) 

Finally, M' moves left three times in a row if and only if M accepts w. Thus, if we could 
decide LeftThree, we could also decide Accept, which is impossible. □ 

There is no hard and fast rule like Rice's theorem to distinguish decidable behaviors from 
undecidable behaviors, but I can offer two rules of thumb. 

• If it is possible to simulate an arbitrary Turing machine while avoiding the target behavior, 
then the behavior is not decidable. For example: there is no algorithm to determine 
whether a given Turing machine reenters its start state, or revisits the left end of the tape, 
or writes a blank. 

• If a Turing machine with the target behavior is limited to a finite number of configurations, 
or is guaranteed to force an infinite loop after a finite number of transitions, then the 
behavior is likely to be decidable. For example, there are algorithms to determine whether 
a given Turing machine ever leaves its start state, or reads its entire input string, or writes 
a non-blank symbol over a blank. 
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Exercises 

1. Let M be an arbitrary Turing machine. 

(a) Describe a Turing machine M R such that 

Accept(M k ) = Reject(M) and Reject(M r ) = Accept(M). 

(b) Describe a Turing machine M A such that 

Accept(M a ) = Accept(M) and Reject(M a ) = 0. 

(c) Describe a Turing machine M H such that 

Accept(M h ) = Halt(M) and Reject(M h ) = 0. 

2. (a) Prove that Accept is undecidable. 

(b) Prove that Reject is undecidable. 

(c) Prove that Diverge is undecidable. 

3. (a) Prove that NeverReject is undecidable. 

(b) Prove that NeverHalt is undecidable. 

(c) Prove that NeverDiverge is undecidable. 

4. Prove that each of the following languages is undecidable. 

(a) AlwaysAccept := {(M) | Accept(M) = S*} 

(b) AlwaysReject := {(M) | Reject(M) = S*} 

(c) AlwaysHalt := {(M) | Halt(M) = £*} 

(d) AlwaysDiverge := {(M) | Diverge(M) = S*} 

5. Let £ be a non-empty proper subset of the set of acceptable languages. Prove that the 
following languages are undecidable: 

(a) RejectIn(£) := {(M) | Reject(M) e L} 

(b) HaltIn(£) := {(M) | Halt(M) e L} 

(c) DivergeIn(£) := {(M) | Diverge(M) € L} 

6. For each of the following decision problems, either sketch an algorithm or prove that the 
problem is undecidable. Recall that w R denotes the reversal of string w. For each problem, 
the input is the encoding (M) of a Turing machine M. 

(a) Does M accept (M) R ? 

(b) Does M reject any palindrome? 

(c) Does M accept all palindromes? 

16 



Models of Computation 



Lecture 8: Undecidiability [Fa'14] 



(d) Does M diverge only on palindromes? 

(e) Is there an input string that forces M to move left? 

(f) Is there an input string that forces M to move left three times in a row? 

(g) Does M accept the encoding of any Turing machine N such that Accept(IV) = 
SelfDiverge? 

7. For each of the following decision problems, either sketch an algorithm or prove that the 
problem is undecidable. Recall that w R denotes the reversal of string w. For each problem, 
the input is an encoding (M, w) of a Turing machine M and its input string w. 

(a) Does M accept the string ww R ? 

(b) Does M accept either w or w R ? 

(c) Does M either accept w or reject w^? 

(d) Does M accept the string w k for some integer fc? 

(e) Does M accept w in at most 2' w ' steps? 

(f) If we run M on input w, does M ever change a symbol on its tape? 

(g) If we run M on input w, does M ever move to the right? 

(h) If we run M on input w, does M ever move to the right twice in a row? 

(i) If we run M on input w, does M move its head to the right more than 2' w ' times (not 
necessarily consecutively)? 

fj) If we run M with input w, does M ever change a □ on the tape to any other symbol? 

(k) If we run M with input w, does M ever change a □ on the tape to 1? 

(1) If we run M with input w, does M ever write a □? 

(m) If we run M with input w, does M ever leave its start state? 

(n) If we run M with input w, does M ever reenter its start state? 

(o) If we run M with input w, does M ever reenter a state that it previously left? That is, 
are there states p 7^ q such that M moves from state p to state q and then later moves 
back to state p? 

8. Let M be a Turing machine, let w be an arbitrary input string, and let s and t be positive 
integers integer. We say that M accepts w in space s if M accepts w after accessing at 
most the first s cells on the tape, and M accepts w in time t if M accepts w after at most t 
transitions. 

(a) Prove that the following languages are decidable: 

i. {(M,w) I M accepts w in time |w| 2 } 

ii. {(M,w) I M accepts w in space |w| 2 } 

(b) Prove that the following languages are undecidable: 

i. {(M) I M accepts at least one string w in time |w| 2 } 

ii. {(M) I M accepts at least one string w in space |w| 2 } 
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9. Let L 0 be an arbitrary language. For any integer i > 0, define the language 

Lj := {(M) I M decides L^}. 

For which integers i > 0 is L i decidable? Obviously the answer depends on the initial 
language L 0 ; give a complete characterization of all possible cases. Prove your answer is 
correct. [Hint: This question is a lot easier than it looks!] 

10. Argue that each of the following decision problems about programs in your favorite 
programming language are undecidable. 

(a) Does this program correctly compute Fibonacci numbers? 

(b) Can this program fall into an infinite loop? 

(c) Will the value of this variable ever change? 

(d) Will this program every attempt to deference a null pointer? 

(e) Does this program free every block of memory that it dynamically allocates? 

(f) Is any statement in this program unreachable? 

(g) Do these two programs compute the same function? 

*ii. Call a Turing machine conservative if it never writes over its input string. More formally, a 
Turing machine is conservative if for every transition <5(p, a) = (q, b, A) where a e S, we 
have b = a; and for every transition <5(p, a) = (q, b, A) where a ^ £, we have b 7^ S. 

(a) Prove that if M is a conservative Turing machine, then Accept(M) is a regular 
language. 

(b) Prove that the language {(M) | M is conservative and M accepts s) is undecidable. 

Together, these two results imply that every conservative Turing machine accepts the same 
language as some DFA, but it is impossible to determine which DFA. 

^12. (a) Prove that it is undecidable whether a given C++ program is syntactically correct. 
[Hint: Use templates!] 

(b) Prove that it is undecidable whether a given ANSI C program is syntactically correct. 
[Hint: Use the preprocessor!] 

(c) Prove that it is undecidable whether a given Perl program is syntactically correct. 
[Hint: Does that slash character / delimit a regular expression or represent division?] 



© Copyright 2014 Jeff Erickson. 
This work is licensed under a Creative Commons License (http://creativecommons.0rg/licenses/by-nc-sa/4.O/). 
Free distribution is strongly encouraged; commercial distribution is expressly forbidden. 
See http://www.cs.uiuc.edu/-jeffe/teaching/algorithms/ for the most recent revision. 
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Caveat lector: This is the zeroth (draft) edition of this lecture note. In particular, some topics 
still need to be written. Please send bug reports and suggestions to jeffe@illinois.edu. 



If first you don't succeed, then try and try again. 

And if you don't succeed again, just try and try and try. 

— Marc Blitzstein, "Useless Song", The Three Penny Opera (1954) 
Adaptation of Bertold Brecht, "Das Lied von der Unzulanglichkeit 
menschlichen Strebens" Die Dreigroschenoper (1928) 

Children need encouragement. 

If a kid gets an answer right, tell him it was a lucky guess. 
That way he develops a good, lucky feeling. 
— Jack Handey, "Deep Thoughts", Saturday Night Live (March 21, 1992) 



9 Nondeterministic Turing Machines 
9.1 Definitions 

In his seminal 1936 paper, Turing also defined an extension of his "automatic machines" that 
he called choice machines, which are now more commonly known as nondeterministic Turing 
machines. The execution of a nondeterministic Turing machine is not determined entirely by its 
input and its transition function; rather, at each step of its execution, the machine can choose 
from a set of possible transitions. The distinction between deterministic and nondeterministic 
Turing machines exactly parallels the distinction between deterministic and nondeterministic 
finite-state automata. 

Formally, a nondeterministic Turing machine has all the components of a standard determin- 
istic Turing machine — a finite tape alphabet T that contains the input alphabet S and a blank 
symbol □; a finite set Q of internal states with special start, accept, and reject states; and a 
transition function 5. However, the transition function now has the signature 

5:QxT^2 Qxrx{ - 1 > +1} . 

That is, for each state p and tape symbol a, the output 5(p, a) of the transition function is a set 
of triples of the form (q, b, A) € Q x r x {—1, +1}. Whenever the machine finds itself in state p 
reading symbol a, the machine chooses an arbitrary triple (q, b, A) e 5(p, a), and then changes 
its state to q, writes b to the tape, and moves the head by A. If the set 5(p,a) is empty, the 
machine moves to the reject state and halts. 

The set of all possible transition sequences of a nondeterministic Turing machine N on a 
given input string w define a rooted tree, called a computation tree. The initial configuration 
(start, w, 0) is the root of the computation tree, and the children of any configuration (q, x, i) 
are the configurations that can be reached from (q, x, i) in one transition. In particular, any 
configuration whose state is accept or reject is a leaf. For deterministic Turing machines, this 
computation tree is just a single path, since there is at most one valid transition from every 
configuration. 

© Copyright 2014 Jeff Erickson. 
This work is licensed under a Creative Commons License (http://creativecommons.0rg/licenses/by-nc-sa/4.O/). 
Free distribution is strongly encouraged; commercial distribution is expressly forbidden. 
See http://www.cs.uiuc.edu/-jeffe/teaching/algorithms/ for the most recent revision. 
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9.2 Acceptance and Rejection 

Unlike deterministic Turing machines, there is a fundamental asymmetry between the acceptance 
and rejection criteria for nondeterministic Turing machines. Let N be any nondeterministic 
Turing machine, and let w be any string. 

• N accepts w if and only if there is at least one sequence of valid transitions from the initial 
configuration (start, w, 0) that leads to the accept state. Equivalently N accepts w if the 
computation tree contains at least one accept leaf. 

• N rejects w if and only if every sequence of valid transitions from the initial configuration 
(start, w, 0) leads to the reject state. Equivalently, N rejects w if every path through the 
computation tree ends with a reject leaf. 

In particular, N can accept w even when there are choices that allow the machine to run forever, 
but rejection requires N to halt after only a finite number of transitions, no matter what choices 
it makes along the way. Just as for deterministic Turing machines, it is possible that N neither 
accepts nor rejects w. 

Acceptance and rejection of languages are defined exactly as they are for deterministic 
machines. A non-deterministic Turing machine N accepts a language L c s* if M accepts all 
strings in L and nothing else; N rejects L if M rejects every string in L and nothing else; and 
finally, N decides L if M accepts L and rejects S* \ L. 



9.3 Time and Space Complexity 



• Define "time" and "space". 

• TIME(/(n)) is the class of languages that can be decided by a deterministic multi-tape 
Turing machine in 0(/(n)) time. 

• NTIME(/(n)) is the class of languages that can be decided by a nondeterministic multi- 
tape Turing machine in 0(/(n)) time. 

• SPACE(/(n)) is the class of languages that can be decided by deterministic multi-tape 
Turing machine in 0(/(n)) space. 

• NSPACE(/(n)) is the class of languages that can be decided by a nondeterministic 
multi-tape Turing machine in 0(/(n)) space. 

• Why multi-tape TMs? Because t steps on any fc-tape Turing machine can be simulated in 
0(t log t) steps on a two-tape machine [Hennie and Stearns 1966, essentially using lazy 
counters and amortization], and in 0(t 2 ) steps on a single-tape machine [Hartmanis 
and Stearns 1965; realign multiple tracks at every simulation step]. Moreover, the latter 
quadratic bound is tight [Hennie 1965 (palindromes, via communication complexity)]. 



9.4 Deterministic Simulation 

Theorem 1. For any nondeterministic Turing machine N, there is a deterministic Turing machine 
M that accepts exactly the same strings and N and rejects exactly the same strings as N. Moreover, 
if every computation path ofN on input x halts after at most t steps, then M halts on input x after 
at most 0(t 2 r 2t ) steps, where r is the maximum size of any transition set in N. 

Proof: I'll describe a deterministic machine M that performs a breadth-first search of the 
computation tree of N. (The depth-first search performed by a standard recursive backtracking 
algorithm won't work here. If ATs computation tree contains an infinite path, a depth-first search 
would get stuck in that path without exploring the rest of the tree.) 
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At the beginning of each simulation round, M's tape contains a string of the form 
□□ . . . n . •y 1 q 1 z 1 •y 2 q 2 z 2 • ■ ■ " • YklkH * * 

where each substring 3^2, encodes a configuration (q^jjZj, of some computation path of 
N, and • is a new symbol not in the tape alphabet of IV. The machine M interprets this sequence 
of encoded configurations as a queue, with new configurations inserted on the right and old 
configurations removed from the left. The double-separators • • uniquely identify the start and 
end of this queue; outside this queue, the tape is entirely blank. 

Specifically, in each round, first M appends the encodings of all configurations than N can 
reach in one transition from the first encoded configuration (qi,y\Z 1 , \yi\); then M erases the 
first encoded configuration. 

• • • □□ "yiqiZ 1 »y 2 q2Z2* • • • *y r q r z r » »□□••• 

II It 

• • • •□» »y 2 q 2 z 2 • • • • »y k q k z k • y 1 q 1 z 1 'hfah 'Jr^r ■»□□••• 

Suppose each transition set 5 w (q,a) has size at most r. Then after simulating t steps of 
N, the tape string of M encoding 0(r c ) different configurations of N and therefore has length 
L = Oftr') (not counting the initial blanks). If M begins each simulation phase by moving 
the initial configuration from the beginning to the end of the tape string, which takes 0(t 2 r t ) 
time, the time for the rest of the the simulation phase is negligible. Altogether, simulating all r f 
possibilities for the the tth step of N requires 0(t 2 r 2t ) time. We conclude that M can simulate 
the first t steps of every computation path of N in 0(r 2 r 2t ) time, as claimed. □ 

The running time of this simulation is dominated by the time spent reading from one end of 
the tape string and writing to the other. It is fairly easy to reduce the running time to 0(tr l ) by 
using either two tapes (a "read tape" containing iV-configurations at time t and a "write tape" 
containing IV-configurations at time t + 1) or two independent heads on the same tape (one at 
each end of the queue). 

9.5 Nondeterminism as Advice 

Any nondeterministic Turing machine N can also be simulated by a deterministic machine M 
with two inputs: the user input string w e E*, and a so-called advice string x e fT, where Cl is 
another finite alphabet. Only the first input string w is actually given by the user. At least for 
now, we assume that the advice string x is given on a separate read-only tape. 

The deterministic machine M simulates N step-by-step, but whenever N has a choice of 
how to transition, M reads a new symbol from the advice string, and that symbol determines 
the choice. In fact, without loss of generality, we can assume that M reads a new symbol from 
the advice string and moves the advice-tape's head to the right on every transition. Thus, M's 
transition function has the form 5 M : QxT x £1— >QxTx {—1, +1}, and we require that 

<5jv(q, a) = {<5 M (q, a, co) \ co<eQ.) 

For example, if N has a binary choice 

5jv(branch, ?) = {(left, L, -1), (right, R, +1)} , 
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then M might determine this choice by defining 

<5 M (branch, ?,0) = (left, L,-l) and <5 M (branch, ?, 1) = (right, R, +1) 

More generally, if every set 5 N (p, a) has size r, then we let Cl = {1, 2, . . . , r } and define 5 M (q, a, i) 
to be the ith element of 5 w (q, a) in some canonical order. 

Now observe that N accepts a string w if and only if M accepts the pair (w, x) for some string 
x e SI*, and N rejects w if and only if M rejects the pair (w, x) for aZZ strings x e ST. 

The "advice" formulation of nondeterminism allows a different strategy for simulation by a 
standard deterministic Turing machine, which is often called dovetailing. Consider all possible 
advice strings x, in increasing order of length; listing these advice strings is equivalent to 
repeatedly incrementing a base-r counter. For each advice string x, simulate M on input (w, x) 
for exactly |x| transitions. 

Dovetail m (w): 
for t <— 1 to 00 
done <— True 
for all strings xefi' 

if M accepts (w, x) in at most t steps 
accept 

if M(w,x) does not halt in at most t steps 
done <— False 

if done 
reject 

The most straightforward Turing-machine implementation of this algorithm requires three tapes: 
A read-only input tape containing w, an advice tape containing x (which is also used as a timer 
for the simulation), and the work tape. This simulation requires 0(tr c ) time to simulate all 
possibilities for t steps of the original non-deterministic machine N. 

If we insist on using a standard Turing machine with a single tape and a single head, the 
simulation becomes slightly more complex, but (unlike our earlier queue-based strategy) not 
significantly slower. This standard machine S maintains a string of the form 'wx'z, where 
z is the current work-tape string of M (or equivalently of IV), with marks (on a second track) 
indicating the current positions of the heads on M's work tape and M's advice tape. Simulating 
a single transition of M now requires 0(|x|) steps, because S needs to shuttle its single head 
between these two marks. Thus, S requires 0(r 2 r f ) time to simulate all possibilities for t 
steps of the original non- deterministic machine N. This is significantly faster than the queue- 
based simulation, because we don't record (and therefore don't have to repeatedly scan over) 
intermediate configurations; recomputing everything from scratch is actually cheaper! 



9.6 The Cook-Levin Theorem 



*** 



Define Sat and CircuitSat. Non-determinism is fundamentally different from other Turing 
machine extensions, in that it seems to provide an exponential speedup for some problems, 
just like NFAs can use exponentially fewer states than DFAs for the same language. 



The Cook-Levin Theorem. If SAT eP, thenP=NP. 



Proof: Let L c s* be an arbitrary language in NP, over some fixed alphabet S. There must be 
an integer k and Turing machine M that satisfies the following conditions: 
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• For all strings w£l, there is at least one string x e E* such that M accepts the string wux. 

• For all strings w £ L and x e E*, M rejects the string wux. 

• For all strings w, x e E*, M halts on input wax after at most max{l, \w\ k } steps. 

Now suppose we are given a string w e E*. Let n = |w| and let N = max{l, |w| fc }. We 
construct a boolean formula <J> W that is satisfiable if and only if w e L, by following the execution 
of M on input wax for some unknown advice string x. Without loss of generality, we can assume 
that |x| = N — n — 1 (since we can extend any shorter string x with blanks.) Our formula <£ w uses 
the following boolean variables for all symbols aeT, all states q e Q, and all integers 0 < t < N 
and 0 < i <N + 1. 

• Qt,i,q — M is in state q with its head at position i after t transitions. 

• T t ia — The fcth cell of M's work tape contains a after t transitions. 

The formula <J> W is the conjunction of the following constraints: 

• Boundaries: To simplify later constraints, we include artificial boundary variables just 
past both ends of the tape: 

Qt,i,q = Qt,N+i,q = False for all 0 < t < N and q e Q 

T t,o,a = T t,N+i,a = False for all 0 < t < AT and a e T 

• Initialization: We have the following values for variables with t = 0: 



Qoj.start 


= True 






= False 


for all q ^ start 


H(),i,q 


= False 


for all i ^ 1 and q e Q 




= True 


for all 1 < i < n 




= False 


for all 1 < i < n and a^w { 


To,n+i,a 


= True 




T(),i,a 


= False 


for all a 7^ □ 



• Uniqueness: The variables T Q i a with n + 2 < t < N represent the unknown advice string 
x; these are the "inputs" to <& w . We need some additional constraints ensure that for each 
i, exactly one of these variables is True: 

( \Jt 0M ) a f\{lo~y%- b ) 

• Transitions: For all 1 < t < N and 1 < i < N, the following constraints simulate the 
transition from time t — 1 to time t. 

Qt,i, q = V (Qt-i,z-i,pAr t)i _ lia ) v \J (Q t -i,i+i,p a r M+1>a ) 

5(p,a)=(q J - J +l) 5(p ) a)=(q,-,-l) 

r t ,i,i = V ( Q t"l^P A r t-Ua) V A Qt-l,i,q A ^-1,1,6 I 
5{p,a)=(-,b,-) VqeQ 7 
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Models of Computation 



Lecture 9: Nondeterministic Turing Machines [Fa'14] 



• Output: We have one final constraint that indicates acceptance. 

JV N 

z = V V Qt,i,accept 
t=0 i=l 

A straightforward induction argument implies that without the acceptance constraint, any 
assignment of values to the unknown variables T 0 j A that satisfies the uniqueness constraints 
determines unique values for the other variables in <& w , which consistently describe the execution 
of M. Thus, <f> w is satisfiable if and only if for some input values T Q 1 a , the resulting , including 
acceptance. In other words, $ w is satisfiable if and only if there is a string x e T* such that M 
accepts the input wax. We conclude that $ w is satisfiable if and only if w e L. 

For any any string w of length n, the formula <J> W has 0(iV 2 ) variables and 0(iV 2 ) constraints 
(where the hidden constants depend on the machine M). Every constraint except acceptance 
has constant length, so altogether $ w has length 0(iV 2 ). Moreover, we can construct $ w in 
0(iV 2 ) = 0(n 2fc ) time. 

In conclusion, if we could decide SAT for formulas of size n in (say) 0(n c ) time, then we 
could decide membership in L in 0{n 2kc ) time, which implies that L € P. □ 



Exercises 

1. Prove that the following problem is NP-hard, without using the Cook-Levin Theorem. Given 
a string (M, w) that encodes a non-deterministic Turing machine M and a string w, does 
M accept w in at most \w\ transitions? 



*** 



More excerises! 
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